The 70% → 80% Shift in AI Coding: Why VPs of Engineering Need to Manage Comprehension Debt

The 70% → 80% Shift in AI Coding: Why VPs of Engineering Need to Manage Comprehension Debt

AI-assisted development is going through a subtle but important transition.

Last year, the dominant story was the 70% problem: AI helps you get surprisingly far, surprisingly fast—but the final stretch toward production readiness becomes a grind of edge cases, debugging, and diminishing returns.

Now, the story is changing again. As agentic tools become more capable, the “percentage” may rise to 80%+ in certain contexts. But as Addy Osmani argues, the nature of the problem doesn’t disappear—it shifts. The bottleneck moves from typing code to understanding, verifying, and governing code.

For VPs of Engineering, this is not a tooling issue. It’s an operating model issue.

Why the “70% problem” still matters (even if you’re at 80%)

In The 70% problem, Addy describes two common adoption patterns:

1. Bootstrappers (zero-to-MVP): teams using AI to generate an initial codebase quickly (e.g., design-to-code, rapid prototypes). This compresses “idea → demo” dramatically.

2. Iterators (daily development): engineers using tools like Cursor/Copilot-style assistants for refactors, tests, docs, and incremental feature work.

Both can be real productivity multipliers. But the hidden cost shows up when the org tries to treat prototype-speed as production-speed.

Addy’s field observation is blunt: seniors don’t just accept the output. They constantly refactor, add error handling, strengthen interfaces, and challenge architectural choices. Junior engineers, by contrast, may accept output more readily—leading to fragile “house of cards” systems.

The implication for leadership: AI amplifies existing engineering maturity. It doesn’t replace it.

The “80% problem” is a management problem in disguise

In The 80% Problem in Agentic Coding, Addy highlights a key organizational risk: once agents can generate huge volumes of plausible code, the constraint becomes human comprehension.

He uses language worth repeating to any engineering leader: “If your ability to ‘read’ doesn’t scale at the same rate as the agent’s ability to ‘output,’ you aren’t engineering anymore. You’re rubber stamping.”

That single line captures what many VPs are seeing operationally:

• PR counts surge

• PR size grows

• review times balloon

• and the “quality bar” becomes inconsistent because no one can keep a coherent mental model of what’s changing

Addy frames this as comprehension debt: over time, teams may understand less of their own codebase because “generation” (writing code) outpaces “discrimination” (reviewing and reasoning about code).

What changes when AI errors become conceptual, not syntactic

A key point in the 80% piece: agent failures increasingly look less like syntax bugs and more like conceptual errors—wrong assumptions, missing constraints, architectural drift.

Addy calls out patterns leaders should recognize:

Assumption propagation: a wrong assumption early leads to a large body of code built on faulty premises.

Abstraction bloat: agents produce “comprehensive-looking” scaffolding that increases long-term maintenance cost.

Dead code accumulation: old paths linger; adjacency changes cause unintended edits.

Sycophantic agreement: agents execute without pushing back, even when requirements are contradictory.

None of these are solved by “better prompts” alone. They require system-level guardrails and verification strategy.

The VP Eng mandate: scale verification, not just generation

The strategic mistake is thinking “more AI output” equals “more delivery.” In practice, output without verification creates three predictable outcomes:

1. Review becomes the bottleneck (and morale sink)

2. Quality becomes inconsistent (because standards are implicit and reviewers are overloaded)

3. Ownership becomes fuzzy (because the code changed faster than the team’s shared understanding)

So what do you do?

Make “definition of done” executable

If you want higher autonomy, your standards need to be enforced by systems, not heroics:

• CI checks that reflect real risk (not just “tests passed”)

• linting, type checks, SAST, dependency policy

• release checks that encode what “safe to ship” means

Shift teams toward declarative workflows

Addy highlights a real leverage shift: don’t micromanage the agent’s steps—define success criteria and let it loop until conditions are met (tests, contracts, acceptance criteria).

This is where VPs can help: push the org to invest in specs, contracts, and tests as the control plane.

Constrain autonomy by risk

High-risk domains (auth, payments, infra, privacy) need stronger review + audit trails. Low-risk domains can move faster. Treat agent autonomy like change management.

Where DevEx Surveys fit: measuring the invisible failure modes

Comprehension debt won’t show up first in your DORA metrics. It shows up first in how engineers feel:

• “I don’t trust what’s in the PR.”

• “Reviews are exhausting and never-ending.”

• “CI feels noisy.”

• “I’m shipping more, but I’m less confident.”

• “I’m spending more time coordinating than building.”

This is exactly where DevEx Surveys are high leverage: they reveal whether AI adoption is improving flow—or creating new forms of friction.

The DevEx signals that predict comprehension debt

Use short pulses (monthly or per-quarter) to monitor:

Reviewability

• “PRs are easy to review.”

• “Reviews are timely.”

• “The intent of changes is clear.”

Confidence / quality perception

• “I feel confident in the changes we ship.”

• “Our testing and CI help me move faster.”

Cognitive load

• “I can complete work without excessive context switching.”

• “I can maintain a mental model of the systems I own.”

Standards clarity

• “I know what ‘good’ looks like here.”

• “Quality expectations are consistent across teams.”

When these dip during AI rollout, it’s a sign you’re scaling generation faster than comprehension.

A practical VP Eng playbook for the next 90 days

If you want a concrete plan:

Step 1: Establish a “comprehension budget”

Treat review capacity as a finite resource. Set guidelines on:

• PR size

• PR count per engineer per week

• required documentation of intent

• required tests for common change types

Step 2: Standardize agent usage patterns

Addy highlights what works in practice:

• “AI first draft” + human refactor

• tight iteration loops

• “trust but verify” with automated checks

• explanations on critical paths (“why this design?”)

Operationalize these patterns in an internal playbook.

Step 3: Put DevEx Surveys on the dashboard next to delivery metrics

If throughput rises but DevEx sentiment drops, you’re borrowing against the future.

Closing: the risk isn’t that agents fail—it’s that they succeed too confidently

The 70% problem taught us that “prototype speed” isn’t “production readiness.”

The 80% problem adds a sharper warning: when output scales faster than understanding, teams drift into rubber-stamping—and that’s how quality and security failures happen.

The VP Eng opportunity is to get ahead of this by:

• scaling verification with automation,

• constraining autonomy by risk,

• training teams toward declarative workflows,

• and using DevEx Surveys to catch comprehension debt early—before it turns into outages, rework, and attrition.

March 3, 2026

Want to explore more?

See our tools in action

Developer Experience Surveys

Explore Freemium →

WorkSmart AI

Schedule a demo →
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.