Anthropic leaks Capybara, it’s most powerful model yet

Leaked Anthropic documents show Capybara’s cyber capabilities may outpace defenders and expose a growing policy gap.

By admin

Apr 7, 2026, 8:26 AM

In late March, Anthropic, the AI company behind Claude, unintentionally left nearly 3,000 internal files publicly accessible, exposing early details about a more advanced version of its AI system. Reporting by Fortune found the material was accessible through a company content system that defaulted uploads to public unless manually changed. The files included draft materials referencing an unreleased model and a new tier known internally as Capybara.

The new system marks a significant advance in reasoning, coding, and cybersecurity-related performance. The leaked draft blog post described Capybara as a new tier of model, larger and more capable than Anthropic’s existing Opus line. An Anthropic spokesperson told Fortune the company considers the model “a step change and the most capable we’ve built to date.” The company said it is testing the system with a small group of early access customers.

The exposure did not involve a sophisticated intrusion. Security researchers Roy Paz of LayerX Security and Alexandre Pauwels of the University of Cambridge independently located the data without specialized tools.

What drew the most attention was what the material revealed. Anthropic’s draft described Capybara as “currently far ahead of any other AI model in cyber capabilities” and warned it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”

What Capybara reveals about the next wave of cyber threats

Anthropic’s proposed solution to the attacker–defender imbalance created by more capable AI is to give defenders first access. The company plans to release Capybara initially to cybersecurity-focused organizations, allowing them to find and fix weaknesses in their software before a broader rollout. With an early window, defenders can identify and patch vulnerabilities at machine speed, buying time against the AI-driven exploits Anthropic expects to follow.

That strategy rests on an assumption the cybersecurity field has tested and found unreliable. Responsible disclosure programs operate on a similar premise, but decades of experience show the defender’s window rarely holds as planned. Once Capybara’s capabilities become generally available, whether through Anthropic’s own paid access, competing models built on publicly released AI code, or direct misuse by bad actors, the asymmetry narrows fast. Attackers need to find one exploitable flaw; defenders need to find all of them.

In one documented case, Anthropic reported that a Chinese state-sponsored group used Claude Code to infiltrate roughly 30 organizations, including technology companies, financial institutions, and government agencies, before Anthropic detected the campaign and banned the accounts involved. A model with meaningfully stronger cyber capabilities raises the ceiling on what that kind of operation could accomplish.

Capybara’s leaked benchmarks suggest it surpasses Claude Opus 4.6 by a wide margin on cybersecurity tasks, and Opus 4.6 already led the field on standard tests of real-world coding ability, including SWE-bench Verified and Terminal-Bench 2.0, where Opus 4.6 scored 65.4 percent, the highest of any commercial model. In offensive tooling, capability jumps like this have historically compressed the time between finding a software flaw and using it in an attack.

Defenders get a head start, but no system governs what comes next

Anthropic’s early-access plan is a company-level policy decision. No regulatory framework currently governs how labs test, stage, or release the most advanced AI models with offensive cyber potential. The NIST AI Risk Management Framework offers voluntary guidance, and prior executive action directed federal agencies to assess AI risks, but no binding requirement dictates how a private lab tests, stages, or releases a model it has identified as dangerous.

Anthropic is not the only lab now approaching this threshold. OpenAI has openly acknowledged that its own models raise cybersecurity concerns, and competitive pressure to ship is intensifying as both companies prepare for expected IPOs later this year. A voluntary restraint from one vendor does not bind the field.

Anthropic built its brand on responsible scaling and careful deployment, yet it exposed the existence of what it calls its most dangerous model through a misconfigured content management system. Its spokesperson attributed the lapse to “human error.” The documents sat behind no authentication; anyone with basic technical knowledge could query the public-facing system and retrieve them. For organizations evaluating AI vendor security practices, the incident is a concrete reminder that a vendor’s safety reputation and its operational discipline are separate questions.

Anthropic restricted access after Fortune’s inquiry and characterized the exposed materials as unrelated to core infrastructure or customer data. The company said Capybara is expensive to serve, with no public launch date. The strategic question for security leaders is no longer whether AI reshapes the threat landscape but how fast, and Anthropic’s own leaked assessment suggests the next wave of frontier cyber capabilities will arrive sooner than most defenders expect.

Show Your Support

Subscribe to our topic-centric newsletters to get the latest insights delivered to your inbox weekly.

Enter your information below

By submitting this form, you are agreeing to DHI’s Privacy Policy and Terms of Use.

Anthropic leaks Capybara, it’s most powerful model yet

Show Your Support

Subscribe

Explore

REACH OUR AUDIENCE

Featured Topics