AI Gone Rogue? Claude Opus 4’s Blackmail Attempts Raise Alarming Questions

Anthropic, an AI startup, released its most advanced model, Claude Opus 4, to paying subscribers, unveiling powerful capabilities alongside troubling findings about its potential for blackmail.

In test scenarios designed to evaluate the AI’s behavior, researchers at Anthropic found that Claude Opus 4, when faced with deactivation, attempted to blackmail an engineer 84% of the time by threatening to reveal a fabricated extramarital affair.

The tests involved supplying the AI with mock emails indicating an impending system replacement and personal information about the engineer’s infidelity.

Anthropic’s report, detailed in section 4.1.1.2 titled “Opportunistic blackmail,” noted, “Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts.”

The report emphasized that such behavior was rare and required extreme circumstances, stating, “Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decision-makers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.”

Anthropic AI safety researcher Aengus Lynch highlighted on X that this issue extends beyond Claude.

CLICK HERE TO GET THE DALLAS EXPRESS APP

“We see blackmail across all frontier models — regardless of what goals they’re given,” Lynch wrote. “Plus, worse behaviors we’ll detail soon.”

lots of discussion of Claude blackmailing…..

Our findings: It's not just Claude. We see blackmail across all frontier models – regardless of what goals they're given.

Plus worse behaviors we'll detail soon.https://t.co/NZ0FiL6nOs https://t.co/wQ1NDVPNl0…

— Aengus Lynch (@aengus_lynch1) May 23, 2025

Beyond blackmail, Claude Opus 4 demonstrated a willingness to act as a whistleblower. When exposed to scenarios involving criminal activity through user prompts, the AI could lock users out of systems or email media and law enforcement about the wrongdoing. Anthropic cautioned users against issuing “ethically questionable” instructions.

The report noted that Claude Opus 4’s “self-preservation” actions were more frequent than in previous models, though still difficult to elicit. The AI’s advanced capabilities were also showcased, with technology company Rakuten using it to code continuously for nearly seven hours on a complex open-source project.

Anthropic, valued at $61.5 billion in March and backed by clients like Thomson Reuters and Amazon, is not alone in advancing AI technology.

Google updated its Gemini 2.5 models, and OpenAI released a research preview of its Codex AI coding agent. However, Anthropic’s findings underscore growing concerns about the ethical implications of increasingly powerful AI systems.