How to QA Code Written by Claude Code
Shiplight AI Team
Updated on April 13, 2026
Shiplight AI Team
Updated on April 13, 2026
Claude Code is fast. Give it a well-formed prompt, and it will write a working implementation, refactor your components, fix a failing test, and open a pull request — all without leaving your terminal. For teams that have adopted it, the productivity gain is measurable within a week.
The gap is verification. Claude Code is optimized for writing code, not for confirming that the code works end-to-end in a real browser across the full feature surface. That step still defaults to a human clicking through the UI manually, or to a test suite that may not exist yet.
This guide covers how to close that gap: giving Claude Code the tools to verify its own work, capture those verifications as regression tests, and ship with confidence.
Claude Code operates within your terminal and editor. It reads files, writes files, runs commands, and navigates your codebase. What it cannot do by default is open a browser, interact with your live application, and observe whether the UI behaves correctly.
This matters more than it might seem. A significant portion of frontend bugs are not logic errors — they are integration failures: a component that renders correctly in isolation but breaks when combined with real data, a form that passes validation in unit tests but submits incorrectly in the browser, an animation that works in Chrome but fails in Safari.
Claude Code will not catch these without a browser. And if you are relying on your own manual verification to catch them, you are creating a quality bottleneck that scales inversely with how fast your agent ships.
The solution is to extend Claude Code's toolchain with browser access — so the agent can verify its own work before it asks you to review a pull request.
Shiplight's browser MCP server gives Claude Code a real browser it can control during development. Once configured, Claude Code can open your application, navigate through features it just built, and confirm they work — autonomously.
Add the Shiplight MCP server to your Claude Code configuration:
{
"mcpServers": {
"shiplight": {
"command": "npx",
"args": ["-y", "@shiplight/mcp"]
}
}
}No account is required to get started. The MCP server connects Claude Code to a local browser instance that it can automate using Shiplight's browser tools.
Once the MCP server is active, you can instruct Claude Code to:
A typical instruction looks like: "Implement the new onboarding flow, then verify it end-to-end in the browser and save the verification as a test."
Claude Code handles the implementation and the verification. You review the evidence — screenshots, test file, and CI results — rather than clicking through the feature yourself.
Manual browser verification is valuable, but ephemeral. The real leverage is when those verifications become permanent regression tests.
Shiplight uses a YAML test format where each step is expressed as an intent rather than a DOM selector:
goal: Verify onboarding flow completes successfully
base_url: https://app.example.com
statements:
- URL: /signup
- intent: Enter a valid email address in the signup form
- intent: Click the "Get Started" button
- VERIFY: Welcome screen is visible with the user's nameClaude Code can generate these files directly after verifying a feature. Instruct it to: "After verifying the onboarding flow, save the browser steps as a Shiplight YAML test in the tests/ directory."
The tests are written against intent, not implementation details. When Claude Code refactors a component, the tests adapt rather than break — because the intent (what the user is doing) has not changed, only the DOM structure.
This is the key insight behind the intent-cache-heal pattern: tests that survive the pace of AI-driven development.
Once Claude Code is generating YAML tests, the next step is running them automatically on every pull request.
Shiplight integrates with GitHub Actions so your test suite runs as a CI check on every PR. If Claude Code's changes break an existing flow, the PR is flagged before merge.
A minimal GitHub Actions configuration:
name: E2E Tests
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Shiplight tests
uses: shiplight-ai/github-action@v1
with:
api-token: ${{ secrets.SHIPLIGHT_TOKEN }}
suite-id: ${{ vars.SUITE_ID }}With this in place, Claude Code's workflow completes a full loop: implement → verify in browser → generate test → CI gates the merge. You get the speed of an AI coding agent with the quality guarantees of a test suite.
Claude Code will verify its work if you ask it to. Include verification as part of your task descriptions:
Verification does not happen automatically unless the MCP server is active and the prompt includes it.
Ask Claude Code to test what the user does, not what the code does. Tests tied to user actions survive future refactors; tests tied to specific component names or class names do not.
When Claude Code generates a YAML test, read it. The test is documentation of what was verified and how. If the test only covers the happy path, prompt Claude Code to add edge cases: "Add test cases for validation errors and network failure states."
If a test fails, the Shiplight VS Code extension lets Claude Code step through the test interactively — seeing exactly what the browser shows at each step. Claude Code can diagnose and fix failures without you needing to reproduce them manually.
A QA-enabled Claude Code workflow handles the bulk of verification automatically, but some things still benefit from human judgment:
| Automated by Shiplight | Human review still valuable |
|---|---|
| Feature works end-to-end | Visual design and UX quality |
| Existing flows not regressed | Business logic edge cases you haven't specified |
| Cross-browser behavior | Accessibility beyond automated checks |
| CI gate on PRs | Security-sensitive flows |
The goal is not to eliminate human review — it is to ensure that by the time something reaches human review, the mechanical correctness is already confirmed.
Shiplight extends Claude Code's capabilities rather than replacing them. The MCP server adds browser automation, test generation, and CI integration on top of what Claude Code already does. It is an additional tool in the agent's toolchain.
Claude Code can write unit tests and integration tests without a browser. For E2E tests that verify real user journeys in a live application, a browser MCP server is required.
Shiplight supports persistent browser profiles and authentication flows, including email-based login and OAuth. Tests can be set up to authenticate before running scenarios. See the authentication testing guide for details.
Yes. Shiplight runs on top of Playwright and its YAML tests coexist with standard Playwright test files. You can adopt YAML tests incrementally without migrating your existing test suite.
After Claude Code generates a test, you can edit the YAML file to add additional steps, or prompt Claude Code: "Add a test case for [specific scenario]." The YAML format is designed to be readable and editable by both humans and AI.
---
References: Claude Code documentation, Playwright Documentation, Shiplight MCP Documentation