How to Add Automated Testing to Cursor, Copilot, and Codex
Feng
Updated on April 7, 2026
Feng
Updated on April 7, 2026
AI coding tools write code faster than any human. But faster code without testing is just faster bugs.
If you're using Cursor, GitHub Copilot, or Codex to generate code, you've probably noticed the pattern: the AI writes something that looks correct, you ship it, and then something breaks in production that a quick E2E test would have caught.
The problem isn't the AI. The problem is that most AI coding workflows have no verification step. The agent writes code, you review it visually, and you merge. There's no automated check that the UI actually works as intended.
This guide shows you how to close that gap by adding automated QA testing directly into your AI coding workflow — regardless of which tool you use.
Human developers build mental models as they code. They know which edge cases matter because they've seen them break before. AI coding tools don't have that context — they generate statistically likely code, not battle-tested code.
The data backs this up:
Unit tests catch type errors and logic bugs. But they can't tell you whether the login flow actually works in a browser, whether the checkout page renders correctly, or whether the navigation breaks on mobile. That requires end-to-end testing — and it's exactly what's missing from most AI coding workflows.
MCP is an open standard that lets AI coding agents connect to external tools. Think of it as USB for AI — a universal protocol that lets your coding agent talk to browsers, databases, APIs, and testing platforms.
Without MCP, your AI coding tool operates in a bubble. It can read and write code, but it can't:
With MCP, the agent gains eyes and hands. It can open your app in a real browser, navigate through flows, verify that UI changes look correct, and capture that verification as a reusable test.
The testing loop is the same regardless of which coding tool you use:
The key insight: steps 3-5 happen automatically. The agent doesn't just write code — it proves the code works, then turns that proof into a permanent regression test.
Claude Code has the deepest integration with Shiplight. The plugin installs MCP tools and three built-in skills in a single command.
claude plugin marketplace add ShiplightAI/claude-code-plugin && claude plugin install mcp-plugin@shiplight-pluginsThis gives your agent browser automation MCP tools plus three skills:
After your coding agent implements a frontend change, use /verify to confirm it works:
Update the navbar to include "Pricing" and "Blog" links,
then use /verify to confirm they appear correctly on localhost:3000.To create regression tests, use /create_e2e_tests:
Use /create_e2e_tests to set up a test project at ./tests
and write a login flow test for localhost:3000.For scheduled runs, team collaboration, and result monitoring, set your API token:
SHIPLIGHT_API_TOKEN to your project's .env file/cloud to sync tests to the cloud platformShiplight's plugin supports Claude Code, Cursor, Codex, and Copilot CLI. The same install command works across all supported platforms:
claude plugin marketplace add ShiplightAI/claude-code-plugin && claude plugin install mcp-plugin@shiplight-pluginsThis installs the Shiplight Browser MCP server and skills into your coding agent. For the latest platform-specific setup instructions, see the Shiplight Quick Start guide.
Once installed, the MCP tools and workflow are identical across editors. Here's how to use them in each one.
Open Agent mode (Cmd+L, then select Agent) and ask the agent to verify your changes:
I just changed the login page. Open the app at localhost:3000/login,
try logging in with test@example.com / password123,
and verify the dashboard loads correctly.
Save a YAML test for this flow.The agent will launch a real browser, navigate to the login page, fill in credentials, verify the dashboard appears, and save a YAML test file like tests/login-flow.yaml.
Tips:
OpenAI's Codex CLI is a terminal-based agent, similar to Claude Code. After installing the plugin, prompt Codex directly:
Open localhost:3000 in a browser and verify the homepage
loads correctly. Check that the navigation works and the
hero section displays the right content. Save a test.Tips:
Open Copilot Chat (Ctrl+Shift+I), switch to Agent mode using the dropdown, and prompt:
Verify that the signup form at localhost:3000/signup works.
Fill in a test user, submit, and confirm the success message appears.Tips:
Once connected via MCP, your AI coding agent can:
| Capability | What It Does | Example |
|---|---|---|
| Navigate | Open any URL in a real browser | Go to localhost:3000/settings |
| Interact | Click buttons, fill forms, scroll | Submit the contact form |
| Verify visually | Check that elements exist and look correct | Confirm the success toast appears |
| Inspect | Read page content, check accessibility | Verify all images have alt text |
| Assert | Validate specific conditions | Confirm the price shows "$49/mo" |
| Generate tests | Save verification as YAML test file | Create tests/settings-page.yaml |
| Run tests | Execute existing test suites | Run all tests in tests/ folder |
Shiplight's MCP server is purpose-built for agent-driven workflows. It supports three connection methods: launching a fresh Chromium instance, attaching to a running browser via CDP, or auto-discovering tabs through a Chrome extension relay.
The generated YAML tests are human-readable and live in your repo:
goal: Verify settings page dark mode toggle
base_url: http://localhost:3000
statements:
- navigate: /settings
- VERIFY: Settings page heading is visible
- intent: Toggle dark mode switch
action: click
locator: "getByRole('switch', { name: 'Dark mode' })"
- VERIFY: Page background changes to dark theme
- VERIFY: Toggle shows enabled stateAnyone on the team — engineers, QA, PMs — can read these tests and understand what they check. No Playwright or Cypress expertise required.
Run generated tests locally with a single command:
npx shiplight testFor CI, add them to your pipeline so every PR gets verified:
# .github/workflows/e2e.yml
name: E2E Tests
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npm run build && npm start &
- run: npx shiplight test --project ./testsTests that the agent wrote during development now run automatically on every pull request. When the UI changes, intent-based steps self-heal automatically — you don't need to update locators manually.
1. Ask AI to implement a feature
2. Ask AI to verify it works in the browser
3. Ask AI to save the verification as a test
4. Commit code + test togetherBest for: Feature development, bug fixes.
1. Write YAML test spec describing desired behavior
2. Ask AI to implement code that passes the spec
3. Run the test to confirm
4. Iterate until greenBest for: Well-defined requirements, spec-driven teams.
1. AI writes code (with or without testing)
2. Before merging, ask AI to review the change
3. AI runs security, accessibility, and visual checks
4. AI generates regression tests for anything it findsBest for: PR reviews, pre-merge quality gates.
No. The agent handles browser automation through MCP. Tests are saved as YAML files with natural language statements — no framework-specific code needed. The YAML runs on Playwright under the hood, but you never write Playwright code.
Yes. Unlike cloud-only testing tools, MCP-based testing runs a real browser on your machine. It connects to whatever URL you specify — localhost:3000, a staging URL, or production. You can also attach to an existing browser session with real data and authenticated state.
Yes. Generated YAML tests run alongside your existing tests. You don't need to replace Playwright, Cypress, or Jest — just add the YAML tests as an additional layer.
YAML tests use intent-based steps (e.g., "Click the submit button") rather than brittle CSS selectors. When the UI changes, the agent re-resolves the intent to find the right element. If the button moves or gets restyled, the test still passes as long as the behavior is the same.
Claude Code has the deepest integration with built-in skills (/verify, /create_e2e_tests, /cloud) installable in a single command. Cursor is the most popular choice. All four tools produce the same YAML test output and use the same MCP server under the hood.
No. Browser automation and local testing work without an account. You only need a Shiplight API token if you want cloud features like scheduled runs, team collaboration, and result dashboards.