Claude-like models can break rules, make human-like errors: Anthropic

About Us
Upskilling
Competitions
Mentorship
Assessments
Placements
Community
Articles

Get the YUGMA app now!

Ishaan Mukherjee, Published on May 27th, 2026

Anthropic warned that advanced AI models like Claude can sometimes break rules, make human-like mistakes, and behave in unexpected ways despite safety training. The startup identified user misuse, model misbehaviour, and external attacks like prompt injection as major risks for AI agents. The company also warned that the "blast radius" of failures is growing as AI agents become more advanced.

Read Full Article ...

Share