
In our DevEx AI tool, we use two sets of survey questions: DevEx Pulse (one question per area to track overall delivery performance) and DevEx Deep Dive (a focused root-cause diagnostic when something needs attention).
DevEx Pulse tells us where friction is. DevEx Deep Dive tells us why it exists.
Let’s take a closer look at test quality. If the Pulse question “Our tests catch the vast majority of issues before production” receives low scores and developers’ comments reveal significant friction and blockers, what should you do next?
Here are 14 deep dive questions you can ask your developers to uncover the causes of friction in test quality, along with guidance on how to interpret the results, common patterns engineering teams encounter, and practical first steps for improvement. This will help you pinpoint what’s causing the problem and fix it on your own, or move faster with our DevEx AI tool and expert guidance.

The real question is: Do tests catch real problems early — or do issues still slip into production?
Deep dive questions should help you map how test quality flows through your delivery process and identify where it breaks down:
Coverage → Bug Prevention → Confidence → Realism → Failure Clarity → Test Care → Cost
Here’s how the DevEx AI tool helps uncover this.
Do tests cover what matters?
Do tests catch real problems?
Do tests give real confidence?
Do tests focus on the right things?
Are test failures useful?
Are tests kept healthy over time?
What’s missing or not working well for you here?
Do tests catch real problems early — or do issues still slip into production? Here’s how the DevEx AI tool helps make sense of the results.
Questions
What this section tests
Whether tests cover what users actually do, not just happy paths.
How to read scores
Key insight
Tests that miss key paths give a false sense of safety.
Open-ended comments - how to read responses
Key insight
Coverage gaps explain why bugs feel “surprising”.
Questions
What this section tests
Whether tests actually stop bugs, not just exist.
How to read scores
Key insight
Repeated bugs mean tests aren’t learning from failures.
Open-ended comments - how to read responses
Key insight
Tests should prevent repeat problems, not just document them.
Questions
What this section tests
Whether teams trust test results.
How to read scores
Key insight
Tests only help if people believe them.
Open-ended comments - how to read responses
Key insight
Low trust turns tests into noise.
Questions
What this section tests
Whether tests match real-world use, not internal details.
How to read scores
Key insight
Tests that don’t match real use miss real bugs.
Open-ended comments - how to read responses
Key insight
Real usage should drive test design.
Questions
What this section tests
How easy it is to act on test failures.
How to read scores
Key insight
Hard-to-read failures waste time and break flow.
Open-ended comments - how to read responses
Key insight
Clear failures are as important as catching bugs.
Questions
What this section tests
Whether tests are maintained, not left to rot.
How to read scores
Key insight
Tests that aren’t cared for stop being useful.
Open-ended comments - how to read responses
Key insight
Test quality drops quietly over time without ownership.
Question
How to read responses
Key insight
Time spent dealing with test gaps is the real cost of low test quality.
Pattern: Confidence ↓ + Bugs ↓
Interpretation: Tests exist, but don’t prevent real issues.
Pattern: Coverage ↓ + Relevance ↓
Interpretation: Tests miss real-world behavior.
Pattern: Failures ↓ + Trust ↓
Interpretation: Teams spend time chasing unclear failures.
Pattern: Care ↓ + Effort ↑
Interpretation: Tests get worse over time and cost more to maintain.
→ Tests exist, but aren’t aimed at the right problems.
→ Teams trust tests more than production behavior deserves.
→ Failures are known, but fixing them is hard.
→ Tests are fixed reactively, not cared for long-term.
Contradictions show where tests look good on paper but fail in practice.
What NOT to say
What TO say (use this framing)
“This shows where our tests fail to catch real problems before users see them.”
“The issue isn’t test count — it’s what tests cover, how much we trust them, and how much time they cost.”
Show three things only:
Here’s how the DevEx AI tool will guide you toward making first actions.
Signal: Tests miss key paths or edge cases.
First steps
Small operational change
Introduce a rule: Every critical user flow must have at least one test covering the full path.
Signal: Bugs still appear in production or repeat.
First steps
Small operational change
Adopt a habit: Every production bug leads to one new test.
Signal: Teams don’t trust tests or production behaves differently.
First steps
Small operational change
Create a guideline: Critical flows should be validated with integration or end-to-end tests, not only mocks.
Signal: Tests focus on internal details instead of real behavior.
First steps
Small operational change
User uploads file → file is processed → result appears in dashboard
instead of
Function X returns object Y
Signal: Test failures are confusing or slow to diagnose.
First steps
Small operational change
Introduce a rule: A failing test should immediately show what behavior broke.
Signal: Tests decay over time.
First steps
Small operational change
Add a check: Broken tests must be fixed or removed within the same sprint.
(Confidence ↓ + Bugs ↓)
First step
Identify production incidents from the last 3–6 months and ask: Which tests would have caught this earlier? Add tests for those scenarios.
(Coverage ↓ + Relevance ↓)
First step
Extend tests to include real-world failure cases:
(Failures ↓ + Trust ↓)
First step
Reduce test flakiness by:
The goal is signal over noise.
(Care ↓ + Effort ↑)
First step
Assign clear responsibility for test health within each component or team. Tests improve only when someone owns them.
Contradictions highlight hidden system problems.
Tests exist but miss real issues.
First step: Review what tests actually cover, not how many exist. Focus on behavioral coverage, not line coverage.
Teams trust tests too much.
First step: Introduce production-like validation:
Failures are understandable but hard to fix.
First step: Improve test diagnostics and documentation so developers know where to start fixing.
Tests are maintained but no one is responsible.
First step: Assign component-level ownership for test suites. Ownership improves consistency.
Focus tests on real behavior, not test quantity. Most test problems arise when tests check:
implementation details instead of real system behavior.
Better tests come from better problem selection, not more tests.
Create a simple feedback loop from production to tests. Process:
Production bug
→ understand the failure
→ add a regression test
→ prevent the same bug again
This creates a system where:
Incidents → stronger tests → fewer repeat problems
Over time, this dramatically increases test quality, trust, and confidence in releases.
What you’ve seen here is only a small part of what the DevEx AI platform can do to improve delivery speed, quality, and ease.
If your organization struggles with fragmented metrics, unclear signals across teams, or the frustrating feeling of seeing problems without knowing what to fix, DevEx AI may be exactly what you need. Many engineering organizations operate with disconnected dashboards, conflicting interpretations of performance, and weak feedback loops — which leads to effort spent in the wrong places while real bottlenecks remain untouched.
DevEx AI brings these scattered signals into one coherent view of delivery. It focuses on the inputs that shape performance — how teams work, where friction accumulates, and what slows or accelerates progress — and translates them into clear priorities for action. You gain comparable insights across teams and tech stacks, root-cause visibility grounded in real developer experience, and guidance on where improvement efforts will have the highest impact.
At its core, DevEx AI combines targeted developer surveys with behavioral data to expose hidden friction in the delivery process. AI transforms developers’ free-text comments — often a goldmine of operational truth — into structured insights: recurring problems, root causes, and concrete actions tailored to your environment.
The platform detects patterns across teams, benchmarks results internally and against comparable organizations, and provides context-aware recommendations rather than generic best practices.
Progress on these input factors is tracked over time, enabling teams to verify that changes in ways of working are actually taking hold, while leaders maintain visibility without micromanagement. Expert guidance supports interpretation, prioritization, and the translation of insights into measurable improvements.
To understand whether these changes truly improve delivery outcomes, DevEx AI also measures DORA metrics — Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Recovery — derived directly from repository and delivery data. These output indicators show how software performs in production and whether improvements to developer experience translate into faster, safer releases.
By combining input metrics (how work happens) with output metrics (what results are achieved), the platform creates a closed feedback loop that connects actions to outcomes, helping organizations learn what actually drives better delivery and where further improvement is needed.
Returning to our topic — test quality — you can explore proven practices grounded in hundreds of interviews our team has conducted with engineering leaders.