How to Get Shiplight to Auto-Create End-to-End Tests from PR Diffs
Updated on May 1, 2026
Updated on May 1, 2026
Pull requests are where engineering teams make a decision that matters: ship the change, or hold it back. The problem is that most CI setups can only answer the question “did some tests pass?” not “did we prove the behavior this diff changed?”
Shiplight AI is built for that gap. In a PR-aware workflow, Shiplight analyzes the diff, identifies which user flows are likely impacted, generates targeted end-to-end tests, and verifies results in real browsers.
This post walks through how to set up that loop so test generation is automatic, reviews stay fast, and the tests you accept become durable regression coverage instead of a maintenance tax.
If you take nothing else from this topic, take this: auto-generating tests is not valuable if it just increases test count. Shiplight’s PR-driven philosophy is to map the diff to user impact, then generate only the scenarios that meaningfully reduce merge risk.
A healthy PR-diff test workflow has four properties:
Before you automate anything, ensure the basics are in place:
Shiplight’s agent-first workflow is designed to produce readable YAML tests that live in your repository and show up in PR diffs.
In Shiplight’s Quick Start, you can have your coding agent scaffold a test project and create YAML tests via /create_e2e_tests, then run them locally with:
npx shiplight testThat matters because PR-diff test generation is only useful when the output is something your team can treat like code: review it, refine it, and keep it.
Shiplight’s GitHub Actions integration expects you to run suites against a defined Shiplight environment (and optionally override the URL for preview deploys). It also supports centralized test account configuration for authenticated apps.
This is the unglamorous part that determines whether “automatic” stays automatic after week one.
Shiplight’s PR workflow is designed around a simple sequence: analyze the PR diff, generate targeted tests, run them in real browsers, and review the resulting scenarios like code.
To operationalize that, treat PR-generated tests as a two-stage artifact:
Shiplight’s own guidance is explicit here: not every PR test should live forever, but the best ones should be promoted into shared suites and rerun intentionally.
Once you have suites and environments configured in Shiplight Cloud, wire them into PRs with Shiplight’s GitHub Action. The docs provide a basic pull request workflow that runs on PRs to main or develop and can comment results back on the PR.
Here is the core pattern (replace IDs with your own suite and environment IDs):
name: Shiplight AI Tests
on:
pull_request:
branches:
- main
- develop
# Required for commenting on pull requests
permissions: write-all
jobs:
test:
name: Run Shiplight Tests
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run Shiplight Tests
uses: ShiplightAI/github-action@v1
with:
api-token: ${{ secrets.SHIPLIGHT_API_TOKEN }}
test-suite-id: 123
environment-id: 1
This flow depends on a Shiplight API token stored in GitHub Secrets as SHIPLIGHT_API_TOKEN. By default, the action is set up to comment on pull requests (github-comment defaults to true).
If your PR checks occasionally fail because the preview URL is down, Shiplight supports a preflight test case that runs before your main suites. This lets you fail fast or skip expensive suites when the environment is not healthy.
This is one of the simplest ways to keep PR feedback tight as you scale automated coverage.
Two patterns work well as your team matures:
If you want scheduled runs, collaboration, and cloud-side locator self-updates, Shiplight supports syncing local YAML tests, templates, and functions to Shiplight Cloud.
The docs describe using the /cloud command to guide an agent through operations like syncing tests and running a specific YAML file against an environment.
Even if you execute tests locally or in your own CI runners, Shiplight’s CLI can upload rich run artifacts (screenshots, video, traces) and automatically attach CI and git metadata, including PR number and title when triggered from a PR.
That gives reviewers something better than “it failed” and a wall of logs.
The fastest way to make PR-diff test generation unpopular is to generate too much, too often, with assertions that do not match user impact. Shiplight’s own guardrails are a strong checklist:
PR diffs are the highest-signal artifact in your delivery pipeline. When you attach diff-aware E2E generation to that moment, you stop guessing and start proving.
With Shiplight, the workflow is designed to be simple: the PR opens, Shiplight ties the diff to likely user impact, drafts targeted tests, runs them in real browsers, and gives your team a reviewable path to turn the best scenarios into lasting regression coverage.