Guides

E2E Testing in CI/CD: A Practical Setup Guide

Shiplight AI Team

Updated on April 1, 2026

View as Markdown

End-to-end tests catch the bugs that unit tests miss. They verify that your application works as a real user would experience it — clicking buttons, filling forms, navigating pages. But running E2E tests locally is not enough. If they are not part of your CI/CD pipeline, they are not protecting your production deployments.

This guide walks through adding E2E tests to GitHub Actions and GitLab CI, with practical configurations you can adapt to your own projects. Whether you are running Playwright scripts or YAML-based intent tests, the pipeline setup follows the same principles.

When to Run E2E Tests

Not every pipeline event needs the same test coverage. Running your full E2E suite on every commit wastes resources and slows down feedback. A practical scheduling strategy uses three tiers.

On Pull Request (PR): Run a focused subset of E2E tests that cover the critical user paths. These should complete in under five minutes to keep PR reviews fast. Smoke tests and tests related to changed files are ideal here.

On Merge to Main: Run the full E2E suite. This is your quality gate — nothing ships to production without passing. You have more time budget here since merges happen less frequently than PR pushes.

Nightly (Scheduled): Run extended test suites including cross-browser tests, performance checks, and edge cases. These catch flaky tests and regressions that surface only under specific conditions.

Setting Up GitHub Actions

GitHub Actions is the most common CI/CD platform for teams using GitHub. Here is a complete workflow configuration for E2E tests.

# .github/workflows/e2e-tests.yml
name: E2E Tests

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'  # Nightly at 2 AM UTC

jobs:
  e2e:
    runs-on: ubuntu-latest
    timeout-minutes: 30
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium

      - name: Start application
        run: npm run start &
        env:
          NODE_ENV: test

      - name: Wait for app
        run: npx wait-on http://localhost:3000 --timeout 60000

      - name: Run E2E tests (shard ${{ matrix.shard }}/4)
        run: npx shiplight test --shard=${{ matrix.shard }}/4
        env:
          SHIPLIGHT_API_KEY: ${{ secrets.SHIPLIGHT_API_KEY }}

      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-results-${{ matrix.shard }}
          path: test-results/
          retention-days: 7

A few things to note in this configuration. The fail-fast: false setting ensures all shards complete even if one fails, giving you a complete picture of failures. The if: always() on the artifact upload step ensures test results are saved even on failure, which is critical for debugging.

Setting Up GitLab CI

For teams on GitLab, the setup follows a similar pattern with GitLab CI syntax.

# .gitlab-ci.yml
stages:
  - build
  - test

e2e-tests:
  stage: test
  image: mcr.microsoft.com/playwright:v1.50.0-noble
  parallel: 4
  variables:
    NODE_ENV: test
  before_script:
    - npm ci
    - npm run build
  script:
    - npm run start &
    - npx wait-on http://localhost:3000 --timeout 60000
    - npx shiplight test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
  artifacts:
    when: always
    paths:
      - test-results/
    expire_in: 7 days
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"
    - if: $CI_PIPELINE_SOURCE == "schedule"

GitLab's built-in parallel keyword handles sharding natively with $CI_NODE_INDEX and $CI_NODE_TOTAL variables. The when: always on artifacts serves the same purpose as GitHub's if: always().

Parallelization Strategies

Running E2E tests sequentially is the biggest bottleneck in most pipelines. Parallelization cuts execution time proportionally. A 20-minute suite split across four shards finishes in roughly five minutes.

Shard-based splitting divides your test files evenly across runners. This is the simplest approach and works well when test files have roughly equal execution times. Both GitHub Actions (via matrix strategy) and GitLab CI (via parallel keyword) support this natively.

Duration-based splitting assigns tests to shards based on historical execution times, balancing total duration across runners. This eliminates the problem of one shard taking significantly longer than others. Tools like Playwright's --shard flag with a test duration report handle this automatically.

For teams using Shiplight's YAML-based tests, parallelization works at the test file level. Each YAML test file is independent by design, making it straightforward to distribute across shards.

Handling Failures Gracefully

E2E test failures in CI/CD need more than a red badge. Your pipeline should capture enough context for developers to diagnose and fix the issue without reproducing it locally.

Always save artifacts. Screenshots, videos, and trace files are essential. Configure your test runner to capture these on failure and upload them as pipeline artifacts.

Set meaningful timeouts. A test hanging for 30 minutes wastes runner time and delays feedback. Set both individual test timeouts (30-60 seconds per test) and overall job timeouts (15-30 minutes per shard).

Retry flaky tests carefully. Automatic retries can mask real failures. If you enable retries, limit them to one retry and track which tests needed retrying. Tests that consistently need retries should be investigated, not silenced. Shiplight's intent-based approach reduces flakiness at the source by decoupling test intent from brittle locators.

Report results clearly. Integrate test results into your PR comments or merge request notes. Many CI platforms support JUnit XML reports that surface test failures directly in the PR UI.

# Add to your GitHub Actions workflow
- name: Report results
  if: always()
  uses: dorny/test-reporter@v1
  with:
    name: E2E Test Results
    path: test-results/junit.xml
    reporter: java-junit

PR-Specific Test Selection

Running your full E2E suite on every PR is wasteful. Instead, run tests that are relevant to the changes in that PR.

Tag-based selection lets you mark tests with categories (e.g., auth, checkout, dashboard) and run only the categories affected by changed files. Shiplight's plugin system supports tagging tests and running filtered subsets from CI.

Changed-path filtering triggers specific test suites based on which files changed. If only documentation files changed, skip E2E tests entirely. If auth-related code changed, run the auth test suite.

# GitHub Actions path filtering
on:
  pull_request:
    paths:
      - 'src/**'
      - 'tests/**'
      - 'package.json'

Putting It All Together

A well-configured E2E pipeline follows a clear pattern: run fast smoke tests on PRs, run the full suite on merge, and run extended tests nightly. Parallelize aggressively. Save artifacts always. Report results where developers already look.

The configuration examples above work with any E2E testing tool, but they pair especially well with Shiplight's YAML-based tests. Since each YAML test file is self-contained and declarative, they are naturally suited to parallel execution and clear failure reporting.

For a hands-on walkthrough, try the Shiplight demo to see how YAML-based E2E tests integrate into your existing CI/CD pipeline.

References: GitHub Actions Documentation, Playwright Documentation