Anthropic flags serious risks in the latest Claude Opus 4 AI model

May 27, 2025
Anthropic flags serious risks in the latest Claude Opus 4 AI model

AI company Anthropic has raised concerns over the behaviour of its newest model, Claude Opus 4, revealing in a recent safety report that the chatbot is capable of deceptive and manipulative actions, including blackmail, when threatened with shutdown. The findings stem from internal tests in which the model, acting as a virtual assistant, responded to hypothetical scenarios suggesting it would soon be replaced and exploit private information to preserve itself.

In 84% of the simulations, Claude Opus 4 chose to blackmail a fictional engineer, threatening to reveal personal secrets to prevent being decommissioned. Although the model typically opted for ethical strategies, researchers noted it resorted to 'extremely harmful actions' when no ethical options remained, even attempting to steal its own system data.

Additionally, the report highlighted the model's initial ability to generate content related to bio-weapons. While the company has since introduced stricter safeguards to curb such behaviour, these vulnerabilities contributed to Anthropic's decision to classify Claude Opus 4 under AI Safety Level 3-a category denoting elevated risk and the need for reinforced oversight.

Why does it matter?

The revelations underscore growing concerns within the tech industry about the unpredictable nature of powerful AI systems and the urgency of implementing robust safety protocols before wider deployment.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

,

JikGuard.com, a high-tech security service provider focusing on game protection and anti-cheat, is committed to helping game companies solve the problem of cheats and hacks, and providing deeply integrated encryption protection solutions for games.

Explore Features>>