E2E Testing in GitHub Actions: Setup Guide (2026)
Shiplight AI Team
Updated on April 7, 2026
Shiplight AI Team
Updated on April 7, 2026

Running E2E tests in GitHub Actions is one of the highest-leverage investments a team can make in release confidence. Tests that run automatically on every pull request catch regressions before they reach staging — not after a customer reports them.
But E2E tests in CI have a reputation problem: they're slow, flaky, and often the first thing teams skip when deadlines tighten. This guide covers how to set them up properly — fast, reliable, and self-healing — so they become a trusted release gate rather than a noise source. For teams where tests break every time a developer renames a component, Shiplight's intent-based self-healing eliminates that maintenance burden automatically.
By the end of this guide you'll have:
Create .github/workflows/e2e.yml:
name: E2E Tests
on:
pull_request:
branches: [main, develop]
push:
branches: [main]
jobs:
e2e:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run E2E tests
run: npx playwright test
env:
BASE_URL: ${{ secrets.STAGING_URL }}
TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}
- name: Upload test artifacts on failure
uses: actions/upload-artifact@v4
if: failure()
with:
name: playwright-report
path: playwright-report/
retention-days: 7This is the baseline. It runs on every PR, passes environment secrets safely, and uploads the Playwright HTML report when tests fail.
Add a CI-aware playwright.config.ts so timeouts and workers scale correctly on GitHub-hosted runners:
// playwright.config.ts
export default {
timeout: process.env.CI ? 45000 : 15000,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 2 : undefined,
reporter: process.env.CI ? 'github' : 'list',
};The github reporter outputs test results as GitHub Actions annotations directly in the PR diff view — no artifact download required for a quick pass/fail check.
Never hardcode credentials in your workflow file. Store them in GitHub Secrets (Settings → Secrets and variables → Actions):
| Secret | Purpose |
|---|---|
STAGING_URL | Base URL of your staging environment |
TEST_USER_EMAIL | Test account email |
TEST_USER_PASSWORD | Test account password |
SHIPLIGHT_API_TOKEN | If using Shiplight Cloud for execution |
Reference them in your workflow as ${{ secrets.SECRET_NAME }}. They're masked in logs and never exposed in PR output.
For handling authentication flows specifically — including magic links and email verification codes — see stable auth and email E2E tests.
Single-threaded E2E suites slow down as they grow. Playwright's sharding splits your suite across multiple runners:
jobs:
e2e:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npx playwright install --with-deps chromium
- name: Run shard
run: npx playwright test --shard=${{ matrix.shard }}/4
env:
BASE_URL: ${{ secrets.STAGING_URL }}
- name: Upload shard report
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-report-shard-${{ matrix.shard }}
path: playwright-report/4 shards typically cuts a 20-minute suite down to 5–6 minutes. Scale the matrix based on your test count — aim for each shard running under 5 minutes.
Add a merge job so you get one consolidated HTML report:
merge-reports:
needs: e2e
runs-on: ubuntu-latest
if: always()
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Download shard reports
uses: actions/download-artifact@v4
with:
pattern: playwright-report-shard-*
path: all-reports/
merge-multiple: true
- name: Merge reports
run: npx playwright merge-reports --reporter html ./all-reports
- name: Upload merged report
uses: actions/upload-artifact@v4
with:
name: playwright-report-merged
path: playwright-report/
retention-days: 14Make test failures block merges by adding a branch protection rule in GitHub (Settings → Branches → Add rule):
e2e job as a required checkNow tests are a real quality gate — not optional feedback.
Separate your fast PR suite (critical paths, ~5 min) from deep regression (full coverage, scheduled):
name: Nightly Regression
on:
schedule:
- cron: '0 2 * * *' # 2am UTC every day
jobs:
regression:
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npx playwright install --with-deps
- name: Run full regression suite
run: npx playwright test --project=regression
env:
BASE_URL: ${{ secrets.PRODUCTION_URL }}
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: '{"text": "Nightly regression failed — check GitHub Actions"}'
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}Two-suite strategy: fast PR gate + thorough nightly run. See two-speed E2E strategy for the full approach.
The biggest CI pain point isn't slow tests — it's tests that break every time a developer renames a CSS class or moves a button. This creates a pattern where engineers start ignoring red CI because "it's probably just the tests."
The root problem: traditional E2E tests bind to implementation details (CSS classes, DOM structure, element IDs). Every UI refactor — even one that changes zero behavior — breaks tests. Teams respond by adding retries, then quarantining tests, then ignoring red CI entirely.
Shiplight solves this with intent-based self-healing. Instead of storing CSS selectors as the source of truth, Shiplight tests store the semantic intent of each step:
# Shiplight YAML — survives CSS renames, component refactors, layout changes
goal: Verify user can complete checkout
statements:
- intent: Navigate to the product page
- intent: Add item to cart
- intent: Proceed to checkout
- intent: Enter shipping address
- VERIFY: order confirmation message is visibleWhen a UI change breaks a locator, Shiplight's AI resolves the correct element from the live DOM using the intent description — not a list of fallback selectors. A developer renaming btn-checkout to btn-place-order doesn't break a single test.
Shiplight GitHub Actions integration runs your suite in Shiplight Cloud and posts results directly back to the PR:
- name: Run E2E tests with Shiplight
uses: shiplightai/run-tests@v1
with:
api-token: ${{ secrets.SHIPLIGHT_API_TOKEN }}
suite-id: ${{ vars.E2E_SUITE_ID }}
environment-id: ${{ vars.STAGING_ENV_ID }}
post-pr-comment: trueThe PR comment includes a pass/fail summary, AI-generated failure explanation (root cause + expected vs actual), and a link to the full Shiplight run with screenshots and step-by-step trace. Engineers see exactly what failed and why — without digging through raw logs.
See intent-cache-heal pattern and how to make E2E failures actionable for the full picture.
Cause: Timing differences, missing environment variables, or headless browser behavior.
Fix:
- name: Run tests
run: npx playwright test
env:
CI: true
PWDEBUG: 0Add explicit waits for network requests and avoid page.waitForTimeout() — use page.waitForSelector() or page.waitForLoadState() instead.
Cause: Resource contention, slower CI runners, race conditions — or (most commonly at scale) UI changes breaking locators.
Fix for timing/resource flakiness:
// playwright.config.ts
export default {
retries: process.env.CI ? 2 : 0, // retry only in CI
workers: process.env.CI ? 2 : 4, // fewer workers in CI
timeout: 30000, // explicit timeout
}Fix for UI-change flakiness: If tests break whenever a developer renames a component or restructures the DOM, retries won't help — the locator is genuinely broken. Migrate to semantic selectors (getByRole, getByTestId) for the short-term fix. For a systematic solution, Shiplight's self-healing layer resolves elements by intent rather than cached selectors, so UI changes don't break tests at all.
For the full breakdown, see how to fix flaky tests.
Fix: Shard (Step 3), scope your PR suite to critical paths only, move deep regression to nightly, and cache Playwright browsers:
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}Fix: Use if: always() instead of if: failure() to ensure artifacts upload even when the job times out:
- uses: actions/upload-artifact@v4
if: always()
with:
name: test-results
path: test-results/name: E2E Tests
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
e2e:
runs-on: ubuntu-latest
timeout-minutes: 15
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
- run: npx playwright install --with-deps chromium
- name: Run E2E shard
run: npx playwright test --shard=${{ matrix.shard }}/3
env:
CI: true
BASE_URL: ${{ secrets.STAGING_URL }}
TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}
- uses: actions/upload-artifact@v4
if: always()
with:
name: report-shard-${{ matrix.shard }}
path: playwright-report/
retention-days: 7Use path filters to skip E2E on documentation-only PRs:
on:
pull_request:
paths:
- 'src/**'
- 'tests/**'
- 'package*.json'If you use Vercel, Netlify, or similar, wait for the deployment URL before running tests:
- name: Wait for preview deployment
uses: patrickedqvist/wait-for-vercel-preview@v1.3.1
id: vercel-preview
with:
token: ${{ secrets.GITHUB_TOKEN }}
max_timeout: 120
- name: Run E2E tests
run: npx playwright test
env:
BASE_URL: ${{ steps.vercel-preview.outputs.url }}A rule of thumb: one runner per 5–10 minutes of tests. GitHub-hosted runners are billed per minute per runner. For most teams, 3–4 shards hits the sweet spot between speed and cost.
Yes — with one condition: your suite must be reliable enough to trust. Flaky tests that block merges create false positives and erode trust. Fix flakiness first (see how to fix flaky tests), then enable branch protection.
GitHub-hosted runners (ubuntu-latest) are free for public repos. For private repos on paid plans, they cost $0.008/minute. A 5-minute sharded suite across 3 runners costs ~$0.12/run — about $6/day at 50 PRs. Caching Playwright browsers (Step 3) saves ~45 seconds per runner. For high-volume teams, self-hosted runners or Shiplight Cloud (managed parallel execution) reduce per-run costs significantly.
The root cause is locator-based tests — tests that use CSS classes, IDs, or DOM position as their anchor. Any refactor that touches those breaks the test, even if behavior is unchanged.
Two approaches: (1) migrate to semantic selectors (getByRole, getByTestId) to reduce coupling, or (2) use Shiplight, which stores the intent of each step and resolves it against the live DOM at runtime. With Shiplight, a developer renaming a CSS class or restructuring a component doesn't break any tests — the AI finds the right element from the intent description, not a cached selector.
---
For the complete CI/CD setup guide beyond GitHub Actions, see E2E testing in CI/CD. For scaling beyond a handful of tests, see TestOps guide.
Stop chasing broken selectors in CI. Try Shiplight Plugin — free, no account required · Book a demo
References: GitHub Actions documentation, Playwright CI documentation, Google Testing Blog