Tech and AIAnthropic says ‘evil’ portrayals of AI were responsible for...

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

-


Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Solana wallet Phantom down due to ‘major incident’

Phantom, a cryptocurrency wallet popular on Solana, is currently experiencing downtime due to a “major incident” with the...

Keyboard Shortcuts I Learned From My Cat

My cat Mira is perfect, and has never done anything wrong. She also loves walking on laptop keys—both...

Raoul Pal Says a Bitcoin Supercycle Is More Likely Than Ever in 2026

Key TakeawaysRaoul Pal sees rising supercycle odds driven by debt monetization and the largest capex boom in history.Bitcoin’s...

Advertisement

Bitcoin News Today: CME Group Targets June 1 for BTC Futures Launch

Bitcoin price is trading at $80,700, down -0.2% in the last 24 hours as market volatility continues to...

Why didn’t Donald Trump and Joe Rogan mention bitcoin or crypto?

Joe Rogan’s three-hour interview with Donald Trump avoided the topic of crypto, right as his recent crypto project...

Must read

You might also likeRELATED
Recommended to you