
AI-assisted development is going through a subtle but important transition.
Last year, the dominant story was the “70% problem”: AI helps you get surprisingly far, surprisingly fast—but the final stretch toward production readiness becomes a grind of edge cases, debugging, and diminishing returns.
Now, the story is changing again. As agentic tools become more capable, the “percentage” may rise to 80%+ in certain contexts. But as Addy Osmani argues, the nature of the problem doesn’t disappear—it shifts. The bottleneck moves from typing code to understanding, verifying, and governing code.
For VPs of Engineering, this is not a tooling issue. It’s an operating model issue.
In The 70% problem, Addy describes two common adoption patterns:
1. Bootstrappers (zero-to-MVP): teams using AI to generate an initial codebase quickly (e.g., design-to-code, rapid prototypes). This compresses “idea → demo” dramatically.
2. Iterators (daily development): engineers using tools like Cursor/Copilot-style assistants for refactors, tests, docs, and incremental feature work.
Both can be real productivity multipliers. But the hidden cost shows up when the org tries to treat prototype-speed as production-speed.
Addy’s field observation is blunt: seniors don’t just accept the output. They constantly refactor, add error handling, strengthen interfaces, and challenge architectural choices. Junior engineers, by contrast, may accept output more readily—leading to fragile “house of cards” systems.
The implication for leadership: AI amplifies existing engineering maturity. It doesn’t replace it.
In The 80% Problem in Agentic Coding, Addy highlights a key organizational risk: once agents can generate huge volumes of plausible code, the constraint becomes human comprehension.
He uses language worth repeating to any engineering leader: “If your ability to ‘read’ doesn’t scale at the same rate as the agent’s ability to ‘output,’ you aren’t engineering anymore. You’re rubber stamping.”
That single line captures what many VPs are seeing operationally:
• PR counts surge
• PR size grows
• review times balloon
• and the “quality bar” becomes inconsistent because no one can keep a coherent mental model of what’s changing
Addy frames this as comprehension debt: over time, teams may understand less of their own codebase because “generation” (writing code) outpaces “discrimination” (reviewing and reasoning about code).
A key point in the 80% piece: agent failures increasingly look less like syntax bugs and more like conceptual errors—wrong assumptions, missing constraints, architectural drift.
Addy calls out patterns leaders should recognize:
• Assumption propagation: a wrong assumption early leads to a large body of code built on faulty premises.
• Abstraction bloat: agents produce “comprehensive-looking” scaffolding that increases long-term maintenance cost.
• Dead code accumulation: old paths linger; adjacency changes cause unintended edits.
• Sycophantic agreement: agents execute without pushing back, even when requirements are contradictory.
None of these are solved by “better prompts” alone. They require system-level guardrails and verification strategy.
The strategic mistake is thinking “more AI output” equals “more delivery.” In practice, output without verification creates three predictable outcomes:
1. Review becomes the bottleneck (and morale sink)
2. Quality becomes inconsistent (because standards are implicit and reviewers are overloaded)
3. Ownership becomes fuzzy (because the code changed faster than the team’s shared understanding)
So what do you do?
If you want higher autonomy, your standards need to be enforced by systems, not heroics:
• CI checks that reflect real risk (not just “tests passed”)
• linting, type checks, SAST, dependency policy
• release checks that encode what “safe to ship” means
Addy highlights a real leverage shift: don’t micromanage the agent’s steps—define success criteria and let it loop until conditions are met (tests, contracts, acceptance criteria).
This is where VPs can help: push the org to invest in specs, contracts, and tests as the control plane.
High-risk domains (auth, payments, infra, privacy) need stronger review + audit trails. Low-risk domains can move faster. Treat agent autonomy like change management.
Comprehension debt won’t show up first in your DORA metrics. It shows up first in how engineers feel:
• “I don’t trust what’s in the PR.”
• “Reviews are exhausting and never-ending.”
• “CI feels noisy.”
• “I’m shipping more, but I’m less confident.”
• “I’m spending more time coordinating than building.”
This is exactly where DevEx Surveys are high leverage: they reveal whether AI adoption is improving flow—or creating new forms of friction.
Use short pulses (monthly or per-quarter) to monitor:
Reviewability
• “PRs are easy to review.”
• “Reviews are timely.”
• “The intent of changes is clear.”
Confidence / quality perception
• “I feel confident in the changes we ship.”
• “Our testing and CI help me move faster.”
Cognitive load
• “I can complete work without excessive context switching.”
• “I can maintain a mental model of the systems I own.”
Standards clarity
• “I know what ‘good’ looks like here.”
• “Quality expectations are consistent across teams.”
When these dip during AI rollout, it’s a sign you’re scaling generation faster than comprehension.
If you want a concrete plan:
Step 1: Establish a “comprehension budget”
Treat review capacity as a finite resource. Set guidelines on:
• PR size
• PR count per engineer per week
• required documentation of intent
• required tests for common change types
Step 2: Standardize agent usage patterns
Addy highlights what works in practice:
• “AI first draft” + human refactor
• tight iteration loops
• “trust but verify” with automated checks
• explanations on critical paths (“why this design?”)
Operationalize these patterns in an internal playbook.
Step 3: Put DevEx Surveys on the dashboard next to delivery metrics
If throughput rises but DevEx sentiment drops, you’re borrowing against the future.
The 70% problem taught us that “prototype speed” isn’t “production readiness.”
The 80% problem adds a sharper warning: when output scales faster than understanding, teams drift into rubber-stamping—and that’s how quality and security failures happen.
The VP Eng opportunity is to get ahead of this by:
• scaling verification with automation,
• constraining autonomy by risk,
• training teams toward declarative workflows,
• and using DevEx Surveys to catch comprehension debt early—before it turns into outages, rework, and attrition.