How to get Shiplight to create end-to-end tests from PR diffs automatically

Updated on April 18, 2026

Pull requests are where risk concentrates. They are also where test coverage usually lags.

Most teams do one of two things:

  • Run an existing E2E suite on every PR, even when 90% of the tests are irrelevant to the change.
  • Skip E2E until after merge, then scramble when “small UI tweaks” blow up a critical flow.

Shiplight AI takes a different approach: verify changes in a real browser during development, then turn those verified interactions into durable regression tests. The result is a feedback loop where coverage grows as a byproduct of shipping, not as a separate testing project.

This post lays out an implementation blueprint for automating that loop so PR diffs drive both what gets tested and what new tests get created.

The core idea: diffs define intent, browsers confirm behavior, Shiplight saves the test

PR diffs are a high-signal summary of what changed. On their own, they are not a test plan. What you want is a system that:

  1. Understands what a diff implies at the product level (the “affected flows”).
  2. Validates those flows in a real browser against a preview or staging environment.
  3. Produces regression tests that are readable, reviewable, and resilient to UI churn.

Shiplight’s model is built for that workflow. It combines agent-driven browser automation (via an MCP server), skills like /verify and /create_e2e_tests, and an execution runtime that runs intent-based YAML tests that can self-heal as the UI evolves.

What needs to be in place before you automate

You can absolutely start small, but a few prerequisites make “tests from PR diffs” reliable instead of noisy.

A stable environment to test against.

For PR-level automation, teams typically run against a preview deployment URL (ideal) or a shared staging URL. Shiplight’s GitHub Actions integration supports an environment-url override, which makes preview deployments straightforward to wire in.

A Shiplight Cloud environment and suite structure.

To run automatically on PRs, you will need:

  • A Shiplight API token
  • A test suite ID (or multiple suite IDs)
  • An environment ID

These are the exact inputs the Shiplight GitHub Action expects.

A baseline of authentication and setup conventions.

If your application requires login or initial state, standardize it early. Shiplight supports reusable setup and teardown through hooks and templates (for example, cookie banners, popups, and consistent starting navigation).

Wire Shiplight into your PR pipeline

Even before you generate new tests from diffs, you should ensure every PR can run the right existing coverage and report results back to the PR.

Here is the minimal GitHub Actions pattern: run Shiplight on pull_request, with permissions that allow PR comments.

name: Shiplight AI Tests

on:
pull_request:
branches:
- main
- develop

permissions: write-all

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Run Shiplight Tests
uses: ShiplightAI/github-action@v1
with:
api-token: ${{ secrets.SHIPLIGHT_API_TOKEN }}
test-suite-id: 123
environment-id: 1
# environment-url: https://preview.example.com # optional

This configuration and the required inputs are documented in Shiplight’s GitHub Actions guide, along with a more advanced example that injects a preview URL from your deployment step.

If you are already deploying previews in CI, the “advanced” variant is worth adopting quickly because it gives you PR-specific fidelity: you test exactly what will merge, not whatever staging happens to be running.

Turn PR diffs into new tests automatically

Once your PR pipeline can run Shiplight, there are two practical automation patterns for “generate tests from the diff,” depending on how hands-off you want to be.

Diff-driven generation inside the developer loop

If your team is using an AI coding agent (Claude Code, Cursor, Codex, or GitHub Copilot), Shiplight’s plugin gives that agent “eyes and hands” in a real browser plus skills that formalize the test workflow.

A strong default loop looks like this:

  1. Developer opens a PR.
  2. The agent reviews the change intent and then runs /verify against the preview environment to confirm the UI behaves as expected.
  3. The agent runs /create_e2e_tests to generate durable regression tests from what it just validated.
  4. The tests are run locally with the Shiplight runtime (npx shiplight test) and then optionally synced to Shiplight Cloud for team-wide execution.

This is the cleanest way to ensure that “diff to tests” is grounded in actual UI behavior, not just code-level guesswork.

Fully automated PR generation with a review gate

If you want this to run without a developer prompting an agent, you need two ingredients:

  • A reliable way to capture the PR diff and context at PR time
  • A consistent rule for what “done” means (for example: generate or update tests covering the affected user flows, then run them against the PR environment)

In GitHub-native setups, teams typically obtain diffs using the GitHub API or GitHub CLI (for example, gh pr diff) and treat the diff as one of the inputs to the test-generation workflow.

From there, the key is what Shiplight contributes that traditional frameworks do not:

  • Tests are written as intent-first YAML, so the output is easy to review and maintain.
  • Execution is resilient by design, because Shiplight resolves actions from intent instead of hardcoding brittle selectors.
  • The same system can validate in a real browser and then save the result as regression coverage, which is the bridge that makes “automatic generation” trustworthy.

In practice, teams set a policy such as:

  • Generate tests for PRs that touch specific directories (checkout, billing, auth) or carry a “needs-e2e” label.
  • Always run a small preflight test (login or homepage) before heavier suites to avoid wasting cycles when the preview is unhealthy.

Make the generated tests stick with near-zero maintenance

Automatic test generation only pays off if the tests keep running week after week. Two Shiplight-specific practices help here.

Prefer intent-based steps over DOM-level specificity.

Shiplight YAML tests are designed to read like user stories, with explicit intent per step. Locators can be cached for speed, but when the UI changes, the runtime can re-derive the action from intent rather than forcing you into a rewrite cycle.

Standardize setup with hooks.

Flakiness often comes from inconsistent starting state: modals, cookie banners, stale sessions, and half-loaded pages. Hooks let you apply a consistent “before test” and “after test” routine across suites, without duplicating the same steps in every generated test.

What to measure to know it’s working

When PR diffs are driving test generation, the best success metrics are not “test count” or “green build rate.” Look for:

  • Time to PR confidence: how quickly a PR gets credible UI verification against its preview URL.
  • Coverage growth per merged PR: whether critical flows are gaining durable regression protection as a side effect of normal development.
  • Maintenance load: how much time the team spends fixing tests versus shipping product.

Shiplight is built to keep that maintenance load close to zero by making tests resilient to UI changes and by anchoring test creation to real browser verification, not brittle scripts.