Steve Yegge’s 8 Levels of AI Adoption are Really About How You Work Each Day, Not Which Tool You Use

Steve Yegge’s 8 Levels of AI Adoption are Really About How You Work Each Day, Not Which Tool You Use

Steve Yegge’s “8 levels” chart gets repeated online as a ladder of tools: IDE → agent → orchestrator. But that might miss what Steve was actually trying to show.

The levels are a day‑to‑day operating model for engineers: how much you trust the agent, how you review, and where you spend attention—on code diffs, on agent actions, or on orchestration (task decomposition, coordination, and verification). That’s why the same engineer can look “Level 2” on Monday (tight review, cautious changes) and “Level 6” on Friday (multiple parallel agents) depending on risk, context, and deadlines.

Below is a practical, engineer-facing interpretation of the levels—plus a “taste/time-horizon” perspective from my conversation with Steve, and a set of proven tips & tricks from Addy Osmani’s orchestration patterns and Steve’s own maintainer workflows.

The core shift: from “reading code” to “supervising work”

Traditional dev work centers on producing and reviewing code. Agentic work gradually moves you toward supervising a production line:

  • Early levels: you review the artifact (diffs, commits).
  • Middle levels: you review the process (what the agent is doing, why, and in what order).
  • Higher levels: you design the system of work (who does what, in parallel, with which guardrails).

If there’s one phrase that describes the ladder, it’s:

Diff reviewer → agent supervisor → team orchestrator.

Steve’s 8 Levels reframed as trust + review behavior + attention allocation

Level 1 — No AI

  • Trust: none (doesn’t apply)
  • Review behavior: classic PR/diff review
  • Attention goes to: writing, debugging, and reading code by hand

This becomes increasingly rare in fast-moving teams—not because it’s “bad,” but because throughput norms shift.

Level 2 — Agent in IDE, permissions on

  • Trust: low to moderate
  • Review behavior: diff-first; you inspect what changes before it lands
  • Attention goes to: code diffs + small agent suggestions

This is “AI as a better assistant,” not “AI as a worker.”

Level 3 — Agent in IDE, “YOLO mode”

  • Trust: rising (you allow bigger edits and fewer interruptions)
  • Review behavior: you still review, but accept larger chunks
  • Attention goes to: faster iteration loops (prompt → patch → test → patch)

This is where teams often see big wins—and also where “silent quality drift” can start if verification is weak.

Level 4 — You stop staring at diffs; you watch the agent work

  • Trust: moderate to high
  • Review behavior: “process review” replaces “diff review”
  • Attention goes to: agent actions and decision-making:
    • what files it touched
    • what commands it ran
    • whether it’s validating assumptions

This level is less about code and more about supervision: “Is the agent doing the right things?”

Level 5 — Agent-first; IDE later

  • Trust: high, but selective
  • Review behavior: IDE becomes an inspection tool (spot checks, behavioral validation)
  • Attention goes to: specs, constraints, acceptance criteria

You’re no longer “coding in the editor.” You’re specifying outcomes, then verifying results.

Level 6 — Several agents; you multiplex

  • Trust: high enough to run parallel work
  • Review behavior: review shifts to integration + verification gates
  • Attention goes to: orchestration by hand:
    • delegating tasks
    • tracking status
    • resolving dependency sequencing

Steve’s warning here is real: multiplexing can become addictive because “there’s always another agent you can spin up.”

Level 7 — 10+ agents managed by hand (coordination breaks)

  • Trust: high, but you hit human coordination limits
  • Review behavior: chaos unless you have coordination primitives
  • Attention goes to: preventing collisions:
    • who edits which file
    • what depends on what
    • which output is authoritative

This is where people say: “I accidentally messaged the wrong agent” and “How do I coordinate this?”

Level 8 — You build (or adopt) an orchestrator

  • Trust: system-level trust (you trust the workflow, not the individual agent)
  • Review behavior: quality gates + auditability become the center
  • Attention goes to: designing the factory:
    • task queues and dependency graphs
    • permissions and scopes
    • verification pipelines
    • memory systems (what gets remembered, what gets reset)

This is where Addy Osmani’s “orchestrator model” clicks: your job becomes less “writing software” and more “building the production line that builds software.”

After talking with Steve: why “taste” matters more as agents get better

In our discussion, one theme kept returning: the gap between locally plausible output and globally good engineering. And to be clear: agents have improved a lot since that conversation—especially at execution (multi-file edits, wiring systems together, iterating on errors, running workflows). But that progress doesn’t eliminate the gap; it changes where it shows up. When generation gets cheaper and faster, the cost of a wrong direction compounds sooner—which is why humans remain the long-term compass.

Concretely, as agents get stronger, more output means more need for judgment: it’s easier to ship plausible changes faster than a team can sense long-term consequences. Time-horizon thinking becomes product quality, not just code quality (“is this the right abstraction for the next 6 months?”). And context remains the hard limit—agents don’t naturally carry your organization’s full history and constraints unless you force the loop with specs, reviews, retros, and quality gates. The failure mode shifts from “can’t do it” to “can do it in the wrong direction,” which is exactly where human taste matters most.

This “taste” topic is close to me personally — I’ve written about it before in the context of platform engineering as omakase: in fine dining, omakase is ultimate trust (“I’ll leave it up to you”), but it only works because the chef has taste built through years of practice and constant feedback. It’s a useful analogy for the agent era: as we delegate more execution, the job shifts toward curating outcomes and earning trust through judgment and verification.

A moment from my conversation with Steve that stayed with me: when we talked about what AI can’t reliably replicate yet, we kept returning to time. Models can be very strong “in the now,” but engineers build judgment from continuous experience: years of seeing what breaks, what slows teams down, and what kinds of shortcuts create future pain. That’s why senior engineers don’t just review what changed—they evaluate whether the change will make the system easier or harder to evolve next month. As agents push us up the levels (from diff review to supervising actions to orchestrating teams), this long‑horizon “taste” becomes the most important human contribution: deciding what to build, what not to build, and what quality bar is worth paying for.

This explains why engineers, when asked about tech debt don’t answer only “how it is now”. They also answer:

  • what “good enough” means for this system
  • what the future cost curve looks like
  • what the ideal state is (and how far we are from it)

At higher levels of AI adoption, this matters more, not less—because mistakes compound faster when generation is cheap.

Tips & tricks you can steal today (from Addy Osmani + Steve’s workflows)

1) Treat Level 6+ as an engineering management problem: coordination + verification

Addy’s key line is that the bottleneck shifts from generation to verification. If you’re running multiple agents, assume:

  • you’ll get more output than you can safely review
  • coordination failures (conflicts, duplicated work, mismatched interfaces) become your tax

Practical rule: set a WIP limit—don’t run more agents than you can review meaningfully.

2) Add quality gates before you scale the number of agents

Three reliable gates from Addy’s playbook:

  • Plan approval for risky tasks (cheaper to reject a plan than rewrite code)
  • Hooks on TaskCompleted (lint/tests/security checks; fail → agent keeps working)
  • A reviewer agent (read-only) that runs on every completion so the lead sees “green-reviewed” output first

3) Use isolation: git worktrees per agent

This solves the “everyone edits everything” problem. It also makes it easier to:

  • audit changes
  • discard bad branches
  • merge only what passes verification

4) Build agent-friendly interfaces (“desire paths”), not just documentation

From Steve’s Survival 3.0 framing, friction kills adoption and survival. His Beads/Gas Town approach is essentially:

  • watch how agents try to use your tool
  • implement those “hallucinated” affordances until most guesses become correct

That’s Agent UX as strategy: reduce retries and misunderstandings.

5) For OSS maintainers: don’t default to “request changes”

Steve’s Vibe Maintainer flips the usual OSS maintainer default:

  • triage PRs into easy wins / fix-merge candidates / needs-review
  • prefer absorb-and-transform (fix-merge, cherry-pick, split-merge) over sending contributors into rebase hell
  • reserve “request changes” as a last resort because it causes contributor starvation and increases fork risk

A practical way to apply the levels in day-to-day work

Use the levels as a situational tool, not a badge, e.g.:

  • Low-risk chores (docs, simple refactors): operate like Level 3–5, speed-focused.
  • Medium-risk product work: Level 4–6 with strict plan/test gates.
  • High-risk systems (security, compliance, prod-critical): you might stay “lower” on autonomy but still use agents heavily—just under tighter verification and smaller scopes.

In other words: maturity is not “more YOLO.” Maturity is knowing when to be YOLO and when to be surgical.

Further reading / resources

April 14, 2026

Want to explore more?

See our tools in action

Developer Experience Surveys

Explore Freemium →

WorkSmart AI

Schedule a demo →
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.