The Force won’t save your PR. Catch AI slop before it ships | May 4, 12pm ET

→ Register

89% of Enterprise Engineering Teams Have Experienced an AI-Generated Code Incident. The Data Explains Why.

89% of Enterprise Engineering Teams Have Experienced an AI-Generated Code Incident. The Data Explains Why.

Read the full report 

AI-assisted coding has become standard practice in enterprise software development. The review and verification systems organizations have depended on to catch problems before production have not lept pace. A new survey of 500 U.S. enterprise IT engineers and engineering leaders, conducted by Censuswide in March 2026, finds that 89% of organizations have experienced at least one AI-related production incident. One in four has suffered a complete system outage directly caused by AI-generated code.

These numbers landed against a backdrop that made them hard to dismiss. In the weeks the survey was fielded, Amazon convened an internal engineering meeting to address a series of service outages linked in part to AI-assisted changes, with SVP Dave Treadwell acknowledging that best practices and safeguards around AI coding tools are “not yet fully established.” If that acknowledgment is coming from one of the most resource-rich engineering organizations on earth, most companies are almost certainly behind.

The paradox at the center of enterprise AI adoption

95% of developers say knowing that code is AI-generated changes how closely they review it. More than half say their scrutiny increases significantly. Thirty-nine percent say they scrutinize AI-generated code more heavily than code written by a human colleague.

Hold that finding next to this one: 94% of those same developers say they are confident in the code AI tools produce.

Both numbers are true, and understanding why they coexist is the key to understanding where enterprise AI adoption currently stands. Developers have not lost faith in AI coding tools. They have learned, through direct experience, that AI code fails differently than human code; that it can look syntactically valid and logically coherent while being flawed in ways that only deep inspection will surface. Confidence and scrutiny are not in tension. They are the appropriate simultaneous responses to a tool that is genuinely capable and genuinely unreliable in specific, hard-to-catch ways.

The survey data on what reviewers actually struggle with confirms this. The top complaint, cited by 30% of respondents, is that AI-generated code looks highly accurate on the surface but contains subtle bugs or hallucinated logic. Twenty-three percent cite AI that introduces security vulnerabilities or ignores enterprise best practices. The failure modes are real, they are specific, and developers have encountered them enough to adjust their behavior accordingly.

The productivity picture is uneven

The pitch for AI coding tools is straightforward: write code faster, ship faster, reduce overhead. On average, the data shows a modest net saving in review time across the survey population. But that average masks a significant split. More than 41% of respondents report spending more time on manual review today than before AI coding tools existed. The time savings from AI code generation are real for many developers, but far from universal, and for a substantial share of the engineering population the review burden has grown rather than shrunk.

The impact varies considerably by organization size, and not always in the direction you might expect. Smaller organizations (companies with 51 to 200 employees) are among those seeing the largest review time savings, with respondents averaging 1.56 hours saved per week. The largest enterprises, with 10,001 or more employees, report savings averaging 1.18 hours per week. Those time savings look like a win. As the next section shows, they come with a significant hidden cost.

The guardrails gap

Seventy-nine percent of organizations have implemented automated gates that prevent AI-generated code from being merged if it violates security, compliance, or quality policies. That investment reflects a hard-won recognition: 89% of organizations have experienced at least one AI-related production incident, and one in four has suffered a complete system outage directly caused by AI-generated code. Manual review alone cannot keep pace.

But adoption is not uniform, and the gap is most visible where the stakes are highest. Respondents from organizations with 10,001 or more employees save an average of 1.18 hours per week on manual review, which is a meaningful reduction. Yet they suffer AI-caused outages at the highest rate: 40%, against a 25% overall average. Only 68% of those organizations have automated gates in place. The time savings and the outage rate are not unrelated. When review moves faster without automated verification filling the gap, more flawed code reaches production.

The contrast with mid-market organizations is instructive. Companies with 2,501 to 5,000 employees show an 84% gate adoption rate and a 27% outage rate — better defended and better outcomes, despite operating at significant scale.

The 21% of organizations without automated gates are operating with AI code quality entirely dependent on the consistency and bandwidth of their review teams. That is a significant exposure, and the incident data suggests it is not a theoretical one.

What this means for engineering leaders

The data points to a gap between how quickly AI coding tools have been adopted and how quickly the infrastructure around them has matured. Confidence in AI output is high, but so is the incident rate. Every organization that has adopted AI coding tools without investing equally in automated verification is carrying risk that manual review alone will not consistently catch.

A deeper analysis of the survey data, including full company size breakdowns and incident type distributions, is available in the full research report available here.

This research was commissioned by Qodo and conducted by Censuswide among a sample of 500 U.S. IT engineering leadership titles, engineers, and developers in enterprise organizations. Data was collected March 3 to 6, 2026.

 

Get started with Qodo for AI Code Review

Get Started
Share this post

More from our blog

Check out our musings on generative AI, code integrity, and other geeky stuff: