---
title: "QA Agent vs Verification Tool: When You Need Each (2026)"
excerpt: "A verification tool is enough when your coding agent is the orchestrator and QA fits in one bounded call. A dedicated QA agent is needed when testing has its own plan, persistent state, or runs independently of any coding session. Anthropic's multi-agent coordination patterns explain the line — and Shiplight ships both shapes."
metaDescription: "Anthropic's coordination patterns explain when QA needs a verification tool vs a dedicated QA agent. Decision criteria, examples, and the Shiplight model."
metaTitle: "QA Agent vs Verification Tool: When to Use Each (2026)"
publishedAt: 2026-04-30
updatedAt: 2026-04-30
author: Shiplight AI Team
categories:
 - AI Testing
 - Engineering
 - Architecture
tags:
 - multi-agent-architecture
 - verification-agent
 - qa-agent
 - mcp
 - ai-coding-agents
 - agentic-qa
featuredImage: ./cover.png
featuredImageAlt: "Two-pattern diagram: coding agent calling a verification tool (left) versus a dedicated QA agent running its own plan (right)"
---

**A verification tool is enough when QA fits inside the coding agent's loop — one bounded call, clear pass/fail, no persistent state. A dedicated QA agent is needed when testing has its own plan, accumulates context, or runs independently of any coding session. The decision follows directly from Anthropic's [multi-agent coordination patterns](https://claude.com/blog/multi-agent-coordination-patterns) — generator–verifier vs. orchestrator–subagent. [Shiplight AI](/) is built around both shapes: the [Shiplight Plugin](/plugins) is the verification tool that AI coding agents (Claude Code, Cursor, Codex, GitHub Copilot) call via MCP, and the [Shiplight SDK](/ai-sdk) is the dedicated QA agent for work the plugin's single-call surface can't cover.**

---

The question "do I need a QA agent, or just a verification tool?" comes up almost every time a team starts wiring AI coding agents into their delivery loop. The answer is not "one is better" — it's that they solve different coordination problems, and the right shape depends on where the work lives.

Anthropic's recent post on [multi-agent coordination patterns](https://claude.com/blog/multi-agent-coordination-patterns) gives the cleanest framing. Two of its named patterns map directly onto the QA decision: **generator–verifier** (an agent produces output; another evaluates it against criteria) and **orchestrator–subagent** (a lead agent plans and delegates bounded tasks to specialized workers). A verification tool is the verifier in the first pattern. A QA agent is the subagent — or in some setups, a peer agent — in the second.

This post walks through when each shape is the right call, using Anthropic's criteria. Both are needed in mature setups; the question is which to start with.

## What "Verification Tool" Means

A verification tool is invoked by another agent inside its loop, performs one bounded operation, and returns a structured result. It has no plan of its own. The caller — usually a coding agent — is the orchestrator; the verifier is one capability the orchestrator can reach.

In Anthropic's generator–verifier pattern, this is the verifier role. The article is direct about the constraint: *"The verifier is only as good as its criteria."* A verification tool needs the caller to pass it explicit intent — what should the change do, what should be true after — and it returns a verdict against that intent.

The [Shiplight Plugin](/plugins) is a verification tool in this exact sense. When Claude Code or Cursor finishes a UI change, it calls the plugin's MCP tools — `/verify`, `/create_e2e_tests`, `/review` — and gets back a structured pass/fail with screenshots, traces, and diagnostic output. The coding agent stays in control of the workflow. The plugin handles one bounded thing very well: opening a real browser and answering "did this actually work?"

### When a verification tool is enough

Use a verification tool when all of the following hold:

- **The work fits in one call.** A single PR, a single user-visible change, a single intent statement.
- **The coding agent is already the orchestrator.** Claude Code, Cursor, Codex, or GitHub Copilot is driving the task and just needs a verdict.
- **Pass/fail is the unit of value.** The caller doesn't need a plan from QA — it needs an answer.
- **No persistent context is required.** Each verification is independent of the last.

This describes most agent-driven PR work. The coding agent writes a feature, asks the verifier to confirm it, and either ships or iterates. See [agent-native autonomous QA](/blog/agent-native-autonomous-qa) for the full pattern, or [agentic QA testing](/glossary/agentic-qa-testing) for how the broader category is defined.

## What "Dedicated QA Agent" Means

A dedicated QA agent has its own task, its own plan, and often its own persistent context. It isn't called inside a coding agent's loop — it runs alongside or independently. It can decompose a goal into many bounded actions, sequence them, and accumulate state across runs.

In Anthropic's terms, this is closer to a subagent within an orchestrator–subagent setup, or a worker in the agent-teams pattern when the QA workload is recurring and benefits from "accumulated context." The article notes that teams suit jobs where workers develop context across assignments — which is exactly what test-suite stewardship looks like.

The [Shiplight SDK](/ai-sdk) is built for that role. It's a programmable QA agent: you give it a goal ("maintain regression coverage for the checkout flow"), and it plans the work — what to test, what to generate, what to heal, what to retire — and reports back. It's not waiting for a coding agent to call it.

### When a dedicated QA agent is needed

Reach for a QA agent when any of the following is true:

- **The QA work has its own plan.** Sweeping a suite for flakiness, expanding coverage to a newly built area, retiring tests for deprecated routes.
- **Persistent context matters.** What was tested last week, which tests are quarantined, which intents are stable.
- **It runs without a coding agent in the loop.** Nightly suites, scheduled regressions, post-deploy smoke checks.
- **A single tool call can't express the goal.** "Verify this PR" fits in one call. "Audit our auth flow for [coverage decay](/glossary/coverage-decay)" does not.

The QA agent is the right shape whenever the *testing process itself* is the unit of work, not just the verdict on a single change.

## QA Agent vs Verification Tool: 5 Criteria From Anthropic

Anthropic gives five selection criteria for choosing a coordination pattern. They translate directly to the QA decision:

| Criterion | Verification Tool | Dedicated QA Agent |
|-----------|-------------------|--------------------|
| **Task decomposition clarity** | Single bounded call | Plan with multiple steps |
| **Worker persistence** | Stateless per call | Persistent across runs |
| **Workflow predictability** | Predetermined: verify this | Emergent: figure out what to test |
| **Agent interdependence** | Verifier serves caller | Independent or peer-collaborative |
| **Context accumulation** | None needed | Required (suite history, flakiness budgets, intent registry) |

A useful test: if you can describe the QA task in one sentence with a clear pass/fail, a verification tool is enough. If the task requires "first decide what to do, then do it, then update what you know," you want a QA agent.

## The Shiplight Model: Both Shapes, One System

Shiplight ships both products on a shared foundation — the same [intent-based test format](/glossary/intent-based-testing), the same self-healing engine, the same test artifacts in your repo:

- **[Shiplight Plugin](/plugins)** is the verification tool. It exposes MCP tools that AI coding agents call inline during PR work. Claude Code, Cursor, Codex, and GitHub Copilot use it the same way they use a typecheck or linter — as a capability inside their loop.
- **[Shiplight SDK](/ai-sdk)** is the dedicated QA agent. It runs as its own worker, plans its own work, and maintains the test suite over time. It can be invoked by CI on a schedule, by an orchestrator agent, or directly by humans who want autonomous QA without writing code.

This isn't two separate codebases stapled together. The plugin and SDK share the [intent-cache-heal pattern](/glossary/intent-cache-heal-pattern), the same [verification agent](/glossary/verification-agent) primitives, and the same git-native test artifacts. A test the plugin generates inside a PR can be picked up and maintained by the SDK in the suite. A flaky test the SDK quarantines is visible to the plugin on the next PR run.

### Shiplight Plugin vs Shiplight SDK at a glance

| Dimension | Shiplight Plugin (Verification Tool) | Shiplight SDK (QA Agent) |
|-----------|--------------------------------------|--------------------------|
| **Invoked by** | AI coding agent (Claude Code, Cursor, Codex, GitHub Copilot) via MCP | CI scheduler, orchestrator agent, or human |
| **Scope per call** | One bounded verification | Multi-step plan |
| **State** | Stateless | Persistent across runs |
| **Best for** | PR-time verification, inline checks during dev | Suite stewardship, scheduled regressions, coverage audits |
| **Loop position** | Inside the coding agent's loop | Its own loop |
| **Output** | Structured pass/fail + screenshots/traces | Plan, results, suite updates, reports |

### The rule of thumb

Start with the plugin if your bottleneck is *PR-time verification* — the coding agent is fast, you need it to verify its own work in a real browser before the diff lands. Start with the SDK if your bottleneck is *suite stewardship* — coverage is slipping, flakiness is creeping, nobody owns the tests. Most teams running AI coding agents at scale need both.

## Common Anti-Patterns

A few traps come up repeatedly when teams try to fit one shape to the other:

**Using a verification tool to manage a suite.** Verification tools are stateless by design. Asking a per-call verifier to also remember which tests are quarantined or to plan next month's coverage stretches it past its scope. The result is a coding agent doing implicit QA-suite management between calls — slow, lossy, and unobservable.

**Using a QA agent for inline PR checks.** Dedicated agents are heavier. Spinning one up for every PR adds latency the coding agent can't absorb. Inline verification is a tool-call problem; an agent is the wrong tool.

**Treating "verifier" and "QA agent" as competing categories.** They're complementary. Anthropic's article emphasizes evolving patterns *as specific limitations emerge* — most teams start with one, hit the limit, and add the other.

## FAQ

### What's the difference between a QA agent and a verification tool?

No. A verification tool is invoked by another agent for one bounded operation and returns a verdict — like a function call. A QA agent has its own plan, persistent context, and runs independently. Anthropic's [multi-agent coordination patterns](https://claude.com/blog/multi-agent-coordination-patterns) describe these as the verifier role (generator–verifier pattern) and the subagent role (orchestrator–subagent pattern), respectively.

### When should I use a dedicated QA agent instead of a verification tool?

Use a dedicated QA agent when the QA work has its own plan or persistent context — sweeping a suite for flakiness, maintaining coverage across many areas, running scheduled regressions, or retiring tests for deprecated features. Use a verification tool when the coding agent is already orchestrating and just needs a per-PR verdict.

### Does Shiplight have both?

Yes. The [Shiplight Plugin](/plugins) is the verification tool that AI coding agents (Claude Code, Cursor, Codex, GitHub Copilot) call via MCP during development. The [Shiplight SDK](/ai-sdk) is the dedicated QA agent for autonomous test-suite stewardship. They share the same intent format, healing engine, and git-native artifacts.

### How is this related to the generator-verifier pattern?

Shiplight Plugin is the verifier in a generator–verifier setup where the AI coding agent is the generator. The plugin opens a real browser, exercises the change against stated intent, and returns structured pass/fail. The Shiplight SDK is a step beyond — it can play the verifier role *and* drive its own plan when the QA workload exceeds a single call. See [planner, generator, evaluator](/blog/planner-generator-evaluator-multi-agent-qa) for the broader architecture.

### Do I need to choose one to start?

Most teams start with the Plugin because PR-time verification is the loudest bottleneck when AI coding agents are writing code faster than humans can check it. The SDK becomes the natural next step once the suite itself needs an owner — usually after the first quarter of agent-driven shipping.

## Verification Tool or QA Agent: The Decision in One Line

A verification tool and a QA agent solve different coordination problems. The first is for when QA fits in one bounded call inside a coding agent's loop. The second is for when QA has its own plan, its own context, and its own clock. Anthropic's coordination patterns give a clean framework for the choice; Shiplight is built so you can pick either, or both, without changing your test format or healing model.

If your team is shipping with AI coding agents and still piping every change through a human-driven test cycle, start with the [Shiplight Plugin](/plugins) and let the coding agent verify its own work. When the suite starts to drift, add the [Shiplight SDK](/ai-sdk) and give the suite a dedicated agent.