AI Test Debt
AI test debt is the accumulated quality liability that results when AI coding agents author production code faster than tests can keep up — including untested code paths, brittle scripts that survived a redesign by accident, and quarantined tests left unfixed. Like financial debt, it compounds.
In one sentence
AI test debt is the testing-layer analogue of technical debt — quality liabilities that accrue when the code-authoring layer outruns the test-authoring layer, specifically because AI coding agents ship faster than humans verify.
Three components
| Component | What it is | Symptom |
|---|---|---|
| Coverage debt | New code paths shipped without tests | Coverage decay trending positive |
| Stability debt | Flaky tests accumulating, often quarantined without fixes | Quarantine list grows, never shrinks |
| Resolution debt | Tests that pass against UI changes by accident — wrong elements selected, real regressions hidden | Test pass-rate stays high but production incidents rise |
Why AI coding agents specifically
Pre-AI development was rate-limited by human authoring throughput, so test authoring kept rough pace. With AI agents, the code loop accelerates 2–5× while the test loop stays human-bound unless tests are also AI-generated. The asymmetry creates debt by default.
How to measure
You cannot fix what you cannot see. Track:
- Changed-surface-area coverage for the last 30 days (see coverage decay).
- Quarantine list size and average age — growing list and rising age signal stability debt.
- Test pass-rate vs production incident rate divergence — if tests are increasingly green while incidents rise, resolution debt is the likely cause.
How to pay it down
- Generate tests in the same loop as code — the coding agent calls an agent-native QA tool to author tests for its own changes.
- Time-box quarantine — tests that don't return to the blocking suite within 30 days are reviewed for deletion or rewrite.
- Validate self-healing diffs — every UI redesign that triggers a self-healing test heal must show its diff in PR review, so resolution debt is visible.
What AI test debt is not
- Not the same as low coverage — a team can have low coverage and no debt if it's not changing the code (greenfield idle codebase).
- Not the same as flake count — flakes are one component; coverage and resolution debt are distinct categories.