How to get Shiplight to create end-to-end tests from PR diffs automatically
Updated on April 18, 2026
Updated on April 18, 2026
Pull requests are where risk concentrates. They are also where test coverage usually lags.
Most teams do one of two things:
Shiplight AI takes a different approach: verify changes in a real browser during development, then turn those verified interactions into durable regression tests. The result is a feedback loop where coverage grows as a byproduct of shipping, not as a separate testing project.
This post lays out an implementation blueprint for automating that loop so PR diffs drive both what gets tested and what new tests get created.
PR diffs are a high-signal summary of what changed. On their own, they are not a test plan. What you want is a system that:
Shiplight’s model is built for that workflow. It combines agent-driven browser automation (via an MCP server), skills like /verify and /create_e2e_tests, and an execution runtime that runs intent-based YAML tests that can self-heal as the UI evolves.
You can absolutely start small, but a few prerequisites make “tests from PR diffs” reliable instead of noisy.
A stable environment to test against.
For PR-level automation, teams typically run against a preview deployment URL (ideal) or a shared staging URL. Shiplight’s GitHub Actions integration supports an environment-url override, which makes preview deployments straightforward to wire in.
A Shiplight Cloud environment and suite structure.
To run automatically on PRs, you will need:
These are the exact inputs the Shiplight GitHub Action expects.
A baseline of authentication and setup conventions.
If your application requires login or initial state, standardize it early. Shiplight supports reusable setup and teardown through hooks and templates (for example, cookie banners, popups, and consistent starting navigation).
Even before you generate new tests from diffs, you should ensure every PR can run the right existing coverage and report results back to the PR.
Here is the minimal GitHub Actions pattern: run Shiplight on pull_request, with permissions that allow PR comments.
name: Shiplight AI Tests
on:
pull_request:
branches:
- main
- develop
permissions: write-all
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run Shiplight Tests
uses: ShiplightAI/github-action@v1
with:
api-token: ${{ secrets.SHIPLIGHT_API_TOKEN }}
test-suite-id: 123
environment-id: 1
# environment-url: https://preview.example.com # optional
This configuration and the required inputs are documented in Shiplight’s GitHub Actions guide, along with a more advanced example that injects a preview URL from your deployment step.
If you are already deploying previews in CI, the “advanced” variant is worth adopting quickly because it gives you PR-specific fidelity: you test exactly what will merge, not whatever staging happens to be running.
Once your PR pipeline can run Shiplight, there are two practical automation patterns for “generate tests from the diff,” depending on how hands-off you want to be.
If your team is using an AI coding agent (Claude Code, Cursor, Codex, or GitHub Copilot), Shiplight’s plugin gives that agent “eyes and hands” in a real browser plus skills that formalize the test workflow.
A strong default loop looks like this:
/verify against the preview environment to confirm the UI behaves as expected./create_e2e_tests to generate durable regression tests from what it just validated.npx shiplight test) and then optionally synced to Shiplight Cloud for team-wide execution.This is the cleanest way to ensure that “diff to tests” is grounded in actual UI behavior, not just code-level guesswork.
If you want this to run without a developer prompting an agent, you need two ingredients:
In GitHub-native setups, teams typically obtain diffs using the GitHub API or GitHub CLI (for example, gh pr diff) and treat the diff as one of the inputs to the test-generation workflow.
From there, the key is what Shiplight contributes that traditional frameworks do not:
In practice, teams set a policy such as:
Automatic test generation only pays off if the tests keep running week after week. Two Shiplight-specific practices help here.
Prefer intent-based steps over DOM-level specificity.
Shiplight YAML tests are designed to read like user stories, with explicit intent per step. Locators can be cached for speed, but when the UI changes, the runtime can re-derive the action from intent rather than forcing you into a rewrite cycle.
Standardize setup with hooks.
Flakiness often comes from inconsistent starting state: modals, cookie banners, stale sessions, and half-loaded pages. Hooks let you apply a consistent “before test” and “after test” routine across suites, without duplicating the same steps in every generated test.
When PR diffs are driving test generation, the best success metrics are not “test count” or “green build rate.” Look for:
Shiplight is built to keep that maintenance load close to zero by making tests resilient to UI changes and by anchoring test creation to real browser verification, not brittle scripts.