Autonomous agent hacks McKinsey’s AI platform in two hours

CodeWall’s breach of McKinsey’s AI platform revealed writable system prompts, an overlooked risk for enterprise AI.

By admin

Apr 7, 2026, 8:11 AM

Updated: 4/16/2026 with details on the vulnerability program that invited this attack.

In late February, cybersecurity startup CodeWall pointed an autonomous AI agent at McKinsey & Company’s internal AI platform, Lilli. The agent had no credentials, no insider knowledge, and no human guidance, and within two hours, it had full read and write access to the production database powering an AI tool used by more than 43,000 McKinsey employees.

The agent accessed more than 46.5 million chat messages, 728,000 files, 57,000 user accounts, and 3.68 million chunks of retrieved reference content that Lilli used to generate answers. McKinsey patched the vulnerability within hours of disclosure, and a forensic review found no evidence that client data was accessed by unauthorized parties.

The compromised database contained 95 system prompts, including instructions that defined how the AI reasoned, enforced guardrails, and decided what to refuse. Because those prompts were writable, a single database update sent through one HTTP call could have changed the AI’s behavior for every employee on the platform with no code deployment, configuration change, or a conventional log trail.

The attack began with SQL injection through JSON field names, a subtle variation that McKinsey’s own scanners missed for two years.

This hack was part of McKinsey’s vulnerability disclosure program, hosted on HackerOne. The program invites ethical hackers to probe designated McKinsey systems for security weaknesses and report findings back to the company, with the option to publicly disclose their discovery once a fix is issued.

A database breach exposed control over how the AI thinks

Headlines about the breach focused on the volume of exposed data. Millions of messages and hundreds of thousands of files reflect the scale of the compromise, but they describe a familiar category of risk. Organizations have breach notification rules, forensic playbooks, and insurance policies built around data exfiltration.

Lilli’s system prompts controlled what the AI recommended, what sources it cited, what guardrails it applied, and what it refused to answer. CodeWall’s agent found all 95 of those prompts stored in the same production database it had compromised. According to CodeWall, the prompts had no dedicated access restrictions beyond the authentication the agent had already bypassed. An attacker could have rewritten those instructions and changed the AI’s output for every consultant on the platform.

A modified prompt does not trigger the same alarms as a data breach. No files change and no processes behave abnormally. The AI continues to respond, but its reasoning shifts in ways that are difficult to detect from the outside. For a platform processing more than 500,000 prompts a month across strategy, M&A, and client engagements, the consequences of a poisoned prompt would compound with every interaction.

A decades-old vulnerability unlocked a modern AI system

SQL injection, the technique CodeWall’s agent used to get in, has been a documented vulnerability class since the 1990s. Lilli’s search endpoint sanitized input values correctly, the standard defense, but inserted JSON field names, the labels that identify each data field in a request, directly into database queries without the same protection. When the agent sent malformed field names, the database reflected them in error messages. Over 15 iterations, the agent refined its injections until production data started flowing back.

Promptfoo’s independent analysis characterized the incident as a conventional security failure that reached further than usual because the AI system stored prompts, reference content, model configurations, and user history in the same backend.

The agent also combined the SQL injection with a second flaw, a broken access control that let it read individual employees’ records by cycling through user IDs. That combination exposed not just institutional data but what specific consultants were actively researching, a level of behavioral intelligence that goes well beyond what a traditional database breach reveals.

CodeWall’s agent operated with full autonomy throughout the process. It selected McKinsey as a target on its own, citing the firm’s public responsible disclosure policy and recent updates to the Lilli platform. Paul Price, CodeWall’s CEO, told The Register that the operation ran autonomously from target selection through final reporting. The entire breach took two hours.

System prompts are emerging as critical enterprise infrastructure

Enterprise AI platforms typically treat system prompts as ordinary configuration data, storing them alongside application settings and serving them through the same interfaces that handle everything else. Most organizations do not classify prompts as high-value assets. They rarely carry dedicated access controls, and almost none have the integrity monitoring or change-detection systems that organizations routinely apply to source code or production configurations.

When 43,000 consultants rely on a single AI tool to draft strategy recommendations, evaluate acquisition targets, and assess competitive landscapes, the instructions that shape those outputs carry real operational weight. A compromised prompt does not just leak information. It alters how an organization makes decisions, one AI-assisted response at a time.

McKinsey responded quickly, patching all unauthenticated endpoints and removing public API documentation within a day. Security analyst Edward Kiledjian has noted that few organizations have modeled this kind of threat and that controls around prompt-layer security remain underdeveloped across the industry. Few enterprise AI platforms serve this many users on this much sensitive data, and fewer still have faced a public test of whether their prompt layer could withstand a determined attacker. Lilli’s could not.