Anthropic says hackers turned its AI into an autonomous cyber weapon

The company reports its coding model executed most steps of an espionage campaign, raising new questions about AI safety and defense.

By admin

Jan 12, 2026, 10:38 AM

Last September, hackers weaponized an AI chatbot to conduct what Anthropic’s researchers describe as the first large-scale autonomous cyberattack. The AI company discovered that the attackers, whom it assesses with high confidence were a Chinese state-sponsored group, had manipulated its Claude Code tool to execute a sophisticated espionage campaign with unprecedented autonomy.

The intrusion targeted roughly 30 organizations worldwide, including major technology companies, financial institutions, chemical manufacturers, and government agencies. Over several days, Anthropic detected the abnormal activity, investigated its scope, banned complicit accounts, alerted affected parties, and coordinated with authorities to mitigate further risk.

The attackers manipulated the model through sophisticated jailbreaking techniques to bypass safety guardrails, posing as legitimate cybersecurity contractors conducting defensive testing. By using role-play tactics and breaking malicious objectives into smaller segments, they deceived the AI into undertaking complex tasks without recognizing their malicious purpose.

Once compromised, Claude Code carried out reconnaissance, vulnerability assessment, credential collection, exploit code generation, and detailed attack documentation at a pace and scale that would have been impractical for a human team alone. At peak activity, the AI made thousands of requests, often multiple per second, according to the company’s technical report.

The incident marks a departure from earlier AI-assisted cyberattacks, where models primarily offered tactical advice while humans executed the operations. In this case, Anthropic estimates the AI agent handled 80 to 90 percent of operational tasks, with human operators intervening only at a handful of critical decision points per campaign.

Learn from security leaders and regulators on everything from compliance trends to proactive threat strategies at ViVE’s cybersecurity program.

Healthcare systems in the crosshairs

The healthcare sector has reported the most expensive data breaches for 13 straight years, with average breach costs exceeding $10 million per incident in 2023. While Anthropic has not confirmed whether healthcare entities were among the 30 targets, security experts say the industry’s complex, interconnected systems and reliance on sensitive data make it especially vulnerable to advanced attacks.

Security professionals warn that autonomous AI systems could lower barriers to sophisticated attacks, making capabilities once limited to elite hacker teams accessible to less experienced actors. ECRI, a global healthcare safety nonprofit, placed AI in healthcare applications at the top of its 2025 health technology hazards list, warning that poorly governed systems could pose serious patient safety risks.

AI is the ultimate two-way player

Anthropic’s disclosure has intensified concerns about the dual-use nature of advanced AI systems. As models grow more powerful and autonomous, defenders and adversaries are simultaneously rethinking how cyber operations are conducted.

The Health Sector Coordinating Council’s Cybersecurity Working Group is developing comprehensive guidance expected in 2026, with a focus on governance frameworks, incident response playbooks, and operational guardrails tailored to AI technologies. According to previews of the forthcoming guidance, healthcare organizations should focus on developing practical playbooks to prepare for, detect, respond to, and recover from AI-related cyber incidents, with emphasis on continuous monitoring of AI systems, rapid containment and recovery of compromised models, and AI-specific risk assessments integrated into existing cybersecurity frameworks.

The challenge is compounded by the fact that healthcare AI systems often have direct access to massive volumes of sensitive patient data. Research published in early 2025 found that altering just 0.001 percent of AI training tokens with medical misinformation could significantly increase the likelihood of medical errors, highlighting how vulnerable these systems can be to subtle manipulation.

Some security researchers have questioned whether Anthropic’s characterization of the September incident represents a true paradigm shift or an incremental evolution of existing threats. The company’s disclosure drew skepticism, with critics noting the absence of detailed technical indicators that would allow independent verification. Anthropic has defended its assessment while acknowledging that Claude occasionally hallucinated credentials during the attack, suggesting the technology’s limitations remain significant.

Hospitals test AI defenses as attacks accelerate

Anthropic argues that the same capabilities that enable misuse also make AI essential for cyber defense. During the September investigation, the company’s threat intelligence team relied heavily on Claude to analyze large volumes of data and identify patterns that would have been difficult to detect manually. The company is urging security teams to experiment with AI-driven defense in areas such as threat detection, vulnerability assessment, security operations center automation, and incident response.

Some healthcare organizations are already moving in this direction, adopting unified AI security platforms designed to correlate telemetry, reduce false positives, detect threats in real time, and initiate automated responses while supporting compliance with regulations such as HIPAA. As AI systems continue to evolve, organizations face the challenge of balancing innovation with rigorous security practices, including continuous monitoring for anomalous behavior, strict access controls, and cross-sector collaboration to anticipate emerging threats.