Verification Agent

A verification agent is an AI agent whose specialized role is to confirm that a code change behaves correctly — opening a real browser, exercising the change, comparing observed behavior to expected outcomes, and reporting structured results. It is distinct from the coding agent that authored the change.

In one sentence

A verification agent is the dedicated AI worker that checks code; it is invoked by — and is structurally separate from — the coding agent that wrote the code.

Why the role separation matters

A coding agent that verifies its own work has a conflict of interest: it tends to confirm its own assumptions and to write tests that pass against the implementation it just produced. A separate verification agent breaks the loop:

  • The coding agent writes the change and describes the user-facing intent.
  • The verification agent receives the intent (not the implementation), opens a real browser, exercises the change, and decides whether observed behavior matches the stated intent.

This separation produces less biased outcomes, mirrors the human practice of independent QA review, and gives the testing layer a clearly auditable role in agentic workflows.

Capabilities a verification agent needs

CapabilityWhy it matters
Real browser executionSynthetic environments miss real failure modes
Intent comprehensionWithout intent, the agent has nothing to verify against
Structured diagnostic outputCoding agent must be able to consume the verdict programmatically
Self-healing under changeUI redesigns shouldn't break verification — see self-healing test
Auditable artifactsScreenshots, traces, and decision logs for human review

Where verification agents fit in the dev loop

Most often as the second worker in an agentic pipeline:

  1. Coding agent receives a feature task.
  2. Coding agent writes code and emits a description of intended behavior.
  3. Verification agent opens a browser and verifies behavior.
  4. If verification passes, the change goes to PR. If it fails, the coding agent receives structured failure data and iterates.
  5. Human reviews the verified PR.

The Shiplight Plugin operates this way: it serves as the verification agent for coding agents in Claude Code, Cursor, Codex, and GitHub Copilot.

What a verification agent is not

  • Not the same as a coding agent — separation of authorship and verification is structural.
  • Not the same as static analysis or type-checking — it observes runtime behavior in a real browser.
  • Not a replacement for human review — its output is what reviewers act on, not a substitute for them.

Related terms