Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

Source link

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

LEAVE A REPLY Cancel reply

Latest news

Trump Meme Team Moves $17M in TRUMP to Bitgo as Allocation Wallet Stirs Again

Bitget taps $4T AI boom with OpenAI-linked pre-IPO token on Solana

Why is Osmosis (OSMO) crypto price up 200% today?

POL (Previously MATIC) Price Prediction: Polygon’s Token Migration Is Complete — Here’s Why the Price Still Hasn’t Recovered

Advertisement

OpenAI tool used to create voice bot that can drain crypto wallets

Charli XCX releases new song only available on Instagram (and vinyl)

Must read

Trump Meme Team Moves $17M in TRUMP to Bitgo as Allocation Wallet Stirs Again

Bitget taps $4T AI boom with OpenAI-linked pre-IPO token on Solana

You might also likeRELATED
Recommended to you

Editor Picks

Lithosphere Deploys Full-Stack Development Environment for AI-Native Applications

Lithosphere Integrates AI Mock Providers for Continuous Integration Workflows

Lithosphere to Launch Devnet Environment for Scalable AI Application Testing

Must Read

Trump Meme Team Moves $17M in TRUMP to Bitgo as Allocation Wallet Stirs Again

Bitget taps $4T AI boom with OpenAI-linked pre-IPO token on Solana

Why is Osmosis (OSMO) crypto price up 200% today?

Hot Topics

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

LEAVE A REPLY Cancel reply

Latest news

Advertisement

Must read

You might also likeRELATEDRecommended to you

Editor Picks

Must Read

Hot Topics

You might also likeRELATED
Recommended to you