The Highest-Performing AI Teams Aren’t Using Better Models. They’re Using Better Delivery Practices

Over the past two years, the conversation around AI in software engineering has been dominated by adoption. Organizations have measured licenses, active users, prompt volume, and tool utilization, often treating these metrics as indicators of success. Yet as AI becomes a normal part of everyday engineering work, a different question is emerging. The challenge is no longer whether engineers are using AI. The challenge is understanding why some teams consistently convert AI assistance into meaningful delivery improvements while others experience review overload, rework, and growing maintenance costs.

Across the team we work with, we found that AI is already creating measurable value. Engineers reported significant time savings and broadly positive experiences with AI-assisted work. At the same time, however, a substantial portion of those gains was being consumed downstream through review, validation, correction, testing, and rework. This suggests that the differentiating factor is no longer AI adoption itself. Instead, the differentiating factor is how effectively teams integrate AI-generated work into their delivery process.

This is where the concept of Agentic Experience (AX) becomes useful.

Agentic Experience describes the quality of collaboration between engineers and AI systems. It is not a measure of how often AI is used. It is a measure of how effectively an organization converts AI-generated output into delivery outcomes. When viewed through this lens, one of the most interesting findings from the survey was that the highest-performing teams were not necessarily using AI more than everyone else. They were using it differently, and perhaps more importantly, they were delivering differently.

The Best Teams Are Not the Most Autonomous

One of the dominant narratives surrounding AI-assisted development is the idea that increasing autonomy will naturally increase productivity. According to this view, the path to greater efficiency is straightforward: allow AI to generate larger portions of the implementation, reduce human involvement, and accelerate delivery through automation.

Our data tells a different story.

The strongest and most sustainable AI workflows were not fully autonomous workflows. In fact, many of the workflows associated with the highest levels of satisfaction and perceived productivity involved active human participation throughout the process. Engineers remained deeply involved in shaping solutions, validating assumptions, reviewing outputs, and making architectural decisions. AI accelerated implementation, but it did not replace engineering judgment.

This distinction appears to be critical. The most successful teams treated AI as a collaborator rather than a substitute. They leveraged AI’s strengths while intentionally preserving human understanding and ownership of the resulting work. As a result, they were able to realize productivity gains without creating excessive downstream costs.

The implication is important because it challenges one of the most common assumptions in the current AI discussion. The goal is not necessarily to maximize autonomy. The goal is to maximize the effectiveness of human-AI collaboration.

AI Performs Best When Decisions Have Already Been Made

Another clear pattern emerged around the types of tasks where AI consistently delivered value.

The most successful teams used AI in situations where engineers already understood the problem they were trying to solve, where requirements were relatively clear, and where the intended implementation approach was largely known before coding began. In these situations, AI acted as an accelerator, helping engineers move from intention to implementation more quickly.

What AI was not doing particularly well was making decisions on behalf of the team.

Whenever work required significant interpretation, evolving requirements, architectural trade-offs, or deep business context, engineers reported a noticeable decline in confidence. Generated solutions often required substantial review and correction, and in many cases the effort required to validate the output offset a large portion of the productivity gain.

This observation suggests that AI currently excels at filling implementation gaps rather than decision gaps. It can help engineers execute a solution efficiently, but it is significantly less reliable when asked to determine what the solution should be in the first place.

The highest-performing teams appeared to understand this intuitively. They used AI after clarity had been established, not as a mechanism for discovering clarity.

Repetitive Work Creates the Highest Return

One of the strongest positive signals in the survey involved repetitive implementation work.

Engineers repeatedly described AI as being particularly useful for boilerplate code, routine implementation tasks, repetitive coding patterns, documentation, test generation, and other forms of structured engineering work. These activities often consume significant engineering time while requiring relatively little strategic decision-making. As a result, they represent an ideal match for current AI capabilities.

What is interesting about these workflows is that they create leverage without reducing understanding. Engineers remain fully capable of reviewing and validating the generated output because the underlying task is already familiar and well understood.

In many ways, this represents the healthiest form of AI-assisted development currently visible across engineering organizations. The engineer remains responsible for the solution while AI reduces the mechanical effort required to implement it.

The outcome is not only increased speed but also a lower likelihood of downstream review and maintenance costs.

Debugging May Be the Most Underrated AI Workflow

While code generation receives most of the attention, debugging emerged as one of the most consistently successful AI workflows in the study.

Engineers reported using AI to investigate issues, analyze code, identify mistakes, troubleshoot SQL queries, explore alternative explanations for failures, and accelerate root-cause analysis. These workflows differ from large-scale code generation in an important way: feedback arrives almost immediately.

When AI suggests a potential cause or solution, engineers can quickly validate whether it is correct. The verification cycle is short, the scope remains bounded, and the consequences of mistakes are relatively contained.

This creates a highly effective collaboration model where AI functions as an investigative partner rather than an autonomous contributor.

From an Agentic Experience perspective, debugging workflows have many of the characteristics associated with high maturity. They are iterative, reviewable, easy to validate, and deeply integrated into the engineer’s existing workflow.

Reviewability Is Becoming a First-Class Engineering Concern

Perhaps the most important finding from the survey is that the highest-performing teams appeared to optimize for something that most organizations rarely measure directly: reviewability.

Historically, engineering teams have focused on metrics such as throughput, velocity, cycle time, and implementation efficiency. These metrics made sense in a world where writing code represented the primary bottleneck.

AI changes that equation.

When code generation becomes dramatically cheaper, the limiting factor shifts elsewhere. Engineers can generate large amounts of code quickly, but the human effort required to understand, validate, and maintain that code does not scale at the same rate.

This is why many engineers reported concerns about oversized pull requests, large AI-generated implementations, declining comprehension, and review fatigue. The problem was not that AI generated code incorrectly. The problem was that the resulting changes exceeded the organization’s ability to safely absorb them.

The teams with the strongest outcomes responded by adapting their delivery practices.

They favored smaller commits, smaller pull requests, more focused changes, and tighter review scopes. Rather than maximizing generated output, they optimized for human understanding.

This may sound like a subtle distinction, but it has profound implications.

The organizations that benefit most from AI are unlikely to be those generating the largest volume of code. They are more likely to be the organizations generating code that other humans can confidently review, understand, and maintain.

The Emergence of AI-Native Delivery Practices

What makes these findings particularly interesting is that they point toward the emergence of a new set of engineering practices.

For decades, software delivery systems evolved around the assumption that implementation effort was expensive and scarce. Teams designed processes, governance models, and delivery practices to optimize the production of code.

AI introduces a different reality.

Implementation effort becomes abundant. Human attention remains scarce. Review capacity remains scarce. Understanding remains scarce. Trust remains scarce.

As a result, successful organizations are beginning to adapt their delivery systems accordingly.

Instead of focusing exclusively on generating more output, they are increasingly focused on preserving understanding, maintaining review quality, limiting cognitive overload, and ensuring that generated work remains comprehensible over time.

These adaptations may ultimately prove more important than the choice of AI model itself.

The organizations seeing the greatest long-term benefits from AI are not necessarily those with access to better technology. They are the organizations that are learning how to build delivery systems capable of absorbing AI-generated work safely and sustainably.

Agentic Experience

The most interesting lesson from this study is that AI success appears to be less about technology than about workflow design.

The highest-performing teams were not distinguished by superior prompts, larger context windows, or more advanced models. They were distinguished by the way they structured work around AI capabilities and limitations.

They understood where AI created leverage. They understood where human judgment remained essential. And they adapted their delivery practices to ensure that the value generated by AI could actually survive the journey from implementation to production. As AI capabilities continue to improve, this distinction may become even more important.

Organizations that focus solely on increasing AI adoption will likely continue to see incremental gains.

Organizations that focus on improving Agentic Experience will be building the systems that determine how much of that value ultimately reaches customers.

And in the long run, that difference may matter far more than the technology itself. We can help.

Tags: