AI Test Debt

AI test debt is the accumulated quality liability that results when AI coding agents author production code faster than tests can keep up — including untested code paths, brittle scripts that survived a redesign by accident, and quarantined tests left unfixed. Like financial debt, it compounds.

In one sentence

AI test debt is the testing-layer analogue of technical debt — quality liabilities that accrue when the code-authoring layer outruns the test-authoring layer, specifically because AI coding agents ship faster than humans verify.

Three components

Component	What it is	Symptom
Coverage debt	New code paths shipped without tests	Coverage decay trending positive
Stability debt	Flaky tests accumulating, often quarantined without fixes	Quarantine list grows, never shrinks
Resolution debt	Tests that pass against UI changes by accident — wrong elements selected, real regressions hidden	Test pass-rate stays high but production incidents rise

Why AI coding agents specifically

Pre-AI development was rate-limited by human authoring throughput, so test authoring kept rough pace. With AI agents, the code loop accelerates 2–5× while the test loop stays human-bound unless tests are also AI-generated. The asymmetry creates debt by default.

How to measure

You cannot fix what you cannot see. Track:

Changed-surface-area coverage for the last 30 days (see coverage decay).
Quarantine list size and average age — growing list and rising age signal stability debt.
Test pass-rate vs production incident rate divergence — if tests are increasingly green while incidents rise, resolution debt is the likely cause.

How to pay it down

Generate tests in the same loop as code — the coding agent calls an agent-native QA tool to author tests for its own changes.
Time-box quarantine — tests that don't return to the blocking suite within 30 days are reviewed for deletion or rewrite.
Validate self-healing diffs — every UI redesign that triggers a self-healing test heal must show its diff in PR review, so resolution debt is visible.

What AI test debt is not

Not the same as low coverage — a team can have low coverage and no debt if it's not changing the code (greenfield idle codebase).
Not the same as flake count — flakes are one component; coverage and resolution debt are distinct categories.