AI Generated ~14% Productivity Gains. Only ~6% Reached Delivery Outcomes. We Asked 250 Engineers Why.

Most discussions about AI productivity are converging around a surprisingly modest conclusion.

Recent research from DX found that most engineering organizations are seeing productivity improvements in the 5–15% range, not the 10x gains often implied by AI marketing narratives. Similarly, Google’s DORA research suggests that while AI can produce 35–40% gains in simple, greenfield tasks, productivity improvements in complex engineering environments are typically much lower—often 10% or less.

These findings raise an important question: If AI is genuinely helping engineers work faster, where does the value go?

We surveyed approximately 250 engineers and examined not just AI adoption, but the entire delivery system surrounding AI-assisted development.

What we found was surprising.

Engineers reported saving approximately five hours per week through AI-assisted work while spending approximately three hours per week reviewing, validating, correcting, and reworking AI-generated output.

When translated into organizational capacity, AI generated productivity equivalent to roughly 14% of total engineering capacity, but only about 6% remained as net delivery improvement after review and rework costs were accounted for.

The implication is significant. AI was not failing to generate value. The organization was struggling to retain it.

What we measured, we call Agentic Experience (AX). And it may become one of the most important ways engineering organizations measure AI effectiveness in the years ahead.

From Developer Experience to Agentic Experience

Over the last decade, Developer Experience (DevEx) fundamentally changed how organizations think about engineering productivity, and we measure it with DevEx Pulse questions.

The key insight behind DevEx was simple: Developer productivity is not determined solely by individual skill. It is heavily influenced by the system surrounding developers: tooling, workflows, documentation, feedback loops, organizational friction. A great engineer operating inside a dysfunctional system will still struggle to deliver.

The same principle now applies to AI. Most organizations evaluate AI through adoption metrics: percentage of engineers using AI, number of prompts generated, licenses deployed, hours spent interacting with AI tools. These metrics tell us whether AI is being used. They do not tell us whether AI is improving delivery outcomes.

The effectiveness of AI depends not only on the model, but on the interaction between:

engineers,
AI systems,
delivery workflows,
review processes,
quality standards,
trust,
organizational practices.

This interaction layer is what we call Agentic Experience. Agentic Experience is the quality of collaboration between humans and AI systems within a delivery organization. Just as Developer Experience measures how effectively developers can produce value, Agentic Experience measures how effectively organizations convert AI-generated output into delivered outcomes.

Measuring More Than Adoption

When organizations evaluate AI initiatives, the conversation often starts and ends with adoption.

But adoption is only the beginning. A team can have: 90% AI adoption, broad access to tooling, enthusiastic experimentation, and still struggle to improve delivery performance.

To understand what was actually happening, we looked beyond usage.

We measured:

task fit,
workflow fit,
trust,
standards alignment,
review burden,
delivery impact,
time saved,
rework generated by AI-assisted work.

The organization already showed strong signs of AI adoption. Engineers were actively experimenting. AI was deeply integrated into day-to-day work. There was little evidence of resistance. The challenge was somewhere else. The challenge was what happened after code generation.

The Value Retention Problem

The survey revealed a pattern that is becoming increasingly common across engineering organizations. AI is generating value. But a significant portion of that value never reaches production outcomes.

250 engineers reported:

Time saved through AI = 1,250 h
Time spent reviewing and reworking AI output = 750 h
Net gain = 500 h

Relative to a 250-person engineering organization, this translates into:

AI-generated capacity ~14%
Consumed by review and rework ~8%
Net retained value ~6%

This means that nearly 60% of the productivity gains generated by AI were absorbed downstream by review, validation, correction, testing, cleanup, and maintenance activities.

And this is not a story about AI failing. Quite the opposite. The data suggests that AI is already creating meaningful engineering value. The problem is that organizations are losing a substantial portion of that value before it becomes a delivery outcome. The challenge is not generating more code. The challenge is retaining more of the value AI already generates.

The Bottleneck Has Moved

Historically, software delivery was constrained by implementation. Writing code was expensive. Organizations spent decades optimizing: development frameworks, CI/CD, DevOps, platform engineering, and developer tooling. The goal was simple: reduce the cost of implementation.

AI changes the economics of software delivery: code generation becomes dramatically cheaper, but new bottlenecks emerge. During the survey, engineers repeatedly described the same experience:

Writing code is faster. Understanding the code is slower. AI accelerates implementation, but at the same time, it increases:

code volume,
pull request size,
verification effort,
review effort,
maintenance burden.

As a result, many teams experience something that initially feels paradoxical: they are coding faster, but they are not necessarily delivering faster. The bottleneck has moved - implementation is no longer the primary constraint. Reviewability is becoming the constraint.

Review Capacity Is the New Throughput Constraint

Key lessons is that organizations cannot scale review capacity at the same rate they can scale code generation. AI can generate thousands of lines of code in minutes. Human comprehension does not scale that way. The result is an emerging delivery imbalance: AI accelerates generation faster than organizations can safely absorb generated work.

Engineers repeatedly described:

oversized AI-generated pull requests,
difficult code reviews,
declining comprehension,
“AI slop”,
growing verification overhead.

In many cases, reviewers became the limiting factor in delivery flow. The challenge was no longer creating code. It was confidently understanding, validating, and maintaining it. This is why adoption metrics alone are insufficient.

Two organizations may have identical AI adoption rates. One delivers faster. The other experiences review overload. The difference is not adoption, the difference is Agentic Experience.

The Best AI Workflows Were Not Autonomous

One of the most interesting findings was that the strongest AI workflows were not autonomous workflows. The highest-performing teams did not treat AI as a replacement for engineering judgment. Instead, they treated AI as a collaborator. The most successful workflows consistently involved:

repetitive implementation,
debugging,
troubleshooting,
documentation,
testing,
prototyping,
clearly scoped development tasks.

These workflows shared several characteristics:

bounded scope,
low ambiguity,
active engineer supervision,
high reviewability,
understandable outputs.

The strongest outcomes emerged when engineers remained deeply involved in the work. Not when they attempted to remove themselves from the process. This is an important distinction. The goal is not autonomous software development. The goal is effective human-AI cooperation.

Trust Is the Hidden Variable

Another important finding emerged from the data. Trust was highly contextual.

Engineers trusted AI when:

requirements were clear,
scope was limited,
implementation intent was known,
validation was straightforward.

Trust declined rapidly when:

business logic became complex,
requirements evolved during implementation,
architectural decisions mattered,
generated changes became large.

This suggests that trust is not a property of AI itself. Trust is a property of the workflow.

When trust decreases:

review effort increases,
verification effort increases,
rework increases,
value retention decreases.

In other words - trust directly influences Agentic Experience, and Agentic Experience directly influences how much AI-generated value survives the delivery process.

Why Agentic Experience Matters

Most AI discussions focus on productivity. But productivity is only half the equation. The other half is retention. An organization that generates 14% additional capacity but retains only 6% experiences a very different outcome than one that retains most of what AI creates.

This is why Agentic Experience matters. It shifts the conversation from: how much AI are engineers using? to: how much AI-generated value survives the delivery system? Those are fundamentally different questions. One measures activity, the other measures outcomes.

The Next Frontier Is AI-Native Delivery

Many organizations are still focused on AI adoption. The evidence increasingly suggests that adoption is becoming the easy part. The harder challenge is adapting delivery systems to the realities of AI-generated work. Organizations will need to develop AI-native delivery practices that improve value retention.

That means focusing on:

smaller and more reviewable changes,
bounded implementation scope,
stronger reviewability standards,
explicit ownership of generated code,
workflows designed around human comprehension,
trust-aware engineering practices.

The next frontier is not generating more code. The next frontier is making AI-generated work easier to understand, review, validate, and maintain.

Final Thought

The most important lesson from Developer Experience was that productivity depends on systems, not individuals. The same lesson now applies to AI. AI effectiveness is not determined by model quality alone. It is determined by the quality of the system surrounding the model. That system can be measured, it can be improved. And increasingly, it will determine which organizations successfully translate AI potential into business outcomes.

The most useful question engineering leaders can ask is no longer: “How much AI are our engineers using?”

Instead, they should ask: “How much of the value generated by AI actually survives our delivery system?”

Because the future of AI in software engineering may not be defined by how much value AI creates. It may be defined by how much value organizations are able to retain. And that is exactly what Agentic Experience is designed to measure.

‍

Tags: