How to Get Shiplight to Auto-Create End-to-End Tests from PR Diffs

Updated on May 1, 2026

Pull requests are where engineering teams make a decision that matters: ship the change, or hold it back. The problem is that most CI setups can only answer the question “did some tests pass?” not “did we prove the behavior this diff changed?”

Shiplight AI is built for that gap. In a PR-aware workflow, Shiplight analyzes the diff, identifies which user flows are likely impacted, generates targeted end-to-end tests, and verifies results in real browsers.

This post walks through how to set up that loop so test generation is automatic, reviews stay fast, and the tests you accept become durable regression coverage instead of a maintenance tax.

What automatic from PR diffs should actually mean

If you take nothing else from this topic, take this: auto-generating tests is not valuable if it just increases test count. Shiplight’s PR-driven philosophy is to map the diff to user impact, then generate only the scenarios that meaningfully reduce merge risk.

A healthy PR-diff test workflow has four properties:

  • Diff-aware scope: it focuses on the surfaces the PR likely changed, not generic “checkout works” scripts.
  • Intent-based steps: tests read like user actions and stay resilient as UI structure evolves.
  • Reviewable artifacts: generated tests are easy to inspect, edit, and version alongside code.
  • Fast gating: you get a clear signal on the PR without running an hours-long full regression suite.

Prerequisites that make PR-diff generation work in practice

Before you automate anything, ensure the basics are in place:

A Shiplight test project with YAML tests in-repo

Shiplight’s agent-first workflow is designed to produce readable YAML tests that live in your repository and show up in PR diffs.

In Shiplight’s Quick Start, you can have your coding agent scaffold a test project and create YAML tests via /create_e2e_tests, then run them locally with:

  • npx shiplight test

That matters because PR-diff test generation is only useful when the output is something your team can treat like code: review it, refine it, and keep it.

A reliable environment and test account strategy

Shiplight’s GitHub Actions integration expects you to run suites against a defined Shiplight environment (and optionally override the URL for preview deploys). It also supports centralized test account configuration for authenticated apps.

This is the unglamorous part that determines whether “automatic” stays automatic after week one.

Put Shiplight in the PR loop

Shiplight’s PR workflow is designed around a simple sequence: analyze the PR diff, generate targeted tests, run them in real browsers, and review the resulting scenarios like code.

To operationalize that, treat PR-generated tests as a two-stage artifact:

  1. Draft tests, generated because the diff suggests user-visible risk.
  2. Promoted regressions, when the team agrees the scenario protects a real path and should keep running in the future.

Shiplight’s own guidance is explicit here: not every PR test should live forever, but the best ones should be promoted into shared suites and rerun intentionally.

Make it real with GitHub Actions gating

Once you have suites and environments configured in Shiplight Cloud, wire them into PRs with Shiplight’s GitHub Action. The docs provide a basic pull request workflow that runs on PRs to main or develop and can comment results back on the PR.

Here is the core pattern (replace IDs with your own suite and environment IDs):

name: Shiplight AI Tests

on:
pull_request:
branches:
- main
- develop

# Required for commenting on pull requests
permissions: write-all

jobs:
test:
name: Run Shiplight Tests
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Run Shiplight Tests
uses: ShiplightAI/github-action@v1
with:
api-token: ${{ secrets.SHIPLIGHT_API_TOKEN }}
test-suite-id: 123
environment-id: 1

This flow depends on a Shiplight API token stored in GitHub Secrets as SHIPLIGHT_API_TOKEN. By default, the action is set up to comment on pull requests (github-comment defaults to true).

Add a preflight gate when preview environments are flaky

If your PR checks occasionally fail because the preview URL is down, Shiplight supports a preflight test case that runs before your main suites. This lets you fail fast or skip expensive suites when the environment is not healthy.

This is one of the simplest ways to keep PR feedback tight as you scale automated coverage.

Close the loop by syncing tests and keeping them durable

Two patterns work well as your team matures:

Sync YAML tests to Shiplight Cloud for team-wide operations

If you want scheduled runs, collaboration, and cloud-side locator self-updates, Shiplight supports syncing local YAML tests, templates, and functions to Shiplight Cloud.

The docs describe using the /cloud command to guide an agent through operations like syncing tests and running a specific YAML file against an environment.

Upload CI run artifacts for better PR diagnosis

Even if you execute tests locally or in your own CI runners, Shiplight’s CLI can upload rich run artifacts (screenshots, video, traces) and automatically attach CI and git metadata, including PR number and title when triggered from a PR.

That gives reviewers something better than “it failed” and a wall of logs.

Guardrails that prevent PR-driven generation from becoming noise

The fastest way to make PR-diff test generation unpopular is to generate too much, too often, with assertions that do not match user impact. Shiplight’s own guardrails are a strong checklist:

  • Generate for meaningful deltas, not every refactor. Cosmetic moves should not explode test count.
  • Prefer assertions a user would notice. Pure element-existence checks can pass while the experience is broken.
  • Promote only the best tests into shared suites. Treat promotion as an engineering decision, not an automatic side effect.
  • Lean on self-healing to keep maintenance near zero. Stability is the entire point of making this automatic.

The outcome: PR confidence that scales with velocity

PR diffs are the highest-signal artifact in your delivery pipeline. When you attach diff-aware E2E generation to that moment, you stop guessing and start proving.

With Shiplight, the workflow is designed to be simple: the PR opens, Shiplight ties the diff to likely user impact, drafts targeted tests, runs them in real browsers, and gives your team a reviewable path to turn the best scenarios into lasting regression coverage.