Fast *and* Resilient E2E: Building a Two-Speed Test Suite with Shiplight AI

January 1, 1970

Fast and Resilient E2E: Building a Two-Speed Test Suite with Shiplight AI

End-to-end testing has always lived in a painful trade-off.

  • Write tests the “traditional” way (strict selectors, rigid scripts), and you get speed. Until a minor UI change breaks half the suite.
  • Write tests the “flexible” way (higher-level abstractions, heavier tooling), and you get resilience. Until the suite becomes slow, expensive, and hard to trust in CI.

Shiplight AI is built around a different idea: you should not have to choose between speed and adaptability. You can have both, if you treat intent as the source of truth and treat selectors as a cache.

This post explains a practical approach to designing a two-speed E2E suite with Shiplight AI: a workflow that starts with natural language intent, graduates to deterministic replay, and still self-heals when the product changes.

The core concept: intent first, determinism when it matters

Shiplight’s foundation is intent-based execution. Instead of anchoring every step to brittle XPath or CSS selectors, tests are expressed as user intent, written in natural language.

That matters because E2E failures come in two flavors:

  1. Real regressions: the application is broken.
  2. Test brittleness: the app is fine, but selectors are stale, timing is different, or UI structure shifted.

Shiplight’s model reduces brittleness by letting intent drive the interaction, then uses deterministic Playwright actions for fast replay when the test is stable. Shiplight runs on top of Playwright, with a natural-language layer above it.

Two speeds, one suite: Dynamic (AI) mode and Fast mode

In Shiplight Cloud’s Test Editor, each action step can run with AI enabled or disabled:

  • AI mode (dynamic): evaluates the action description against the current browser state and adapts to DOM and UI changes.
  • Fast mode (performance-optimized): uses cached actions and fixed selectors for quick execution.

This is not an “either-or” decision you make once. It is a strategy you apply test-by-test and step-by-step.

A practical rule of thumb

  • Use AI mode where the UI is changing frequently or where exact selectors are inherently unstable.
  • Use Fast mode for steps you want to run quickly and repeatedly in CI once they have proven stable.

Shiplight’s docs are explicit about why this works: locators are treated as a performance cache, not a hard dependency. When the UI changes and a locator becomes stale, Shiplight can fall back to the natural language description to find the correct element.

Start simple: write the test as a human would explain it

Shiplight tests can be written in YAML using natural language statements, which keeps them readable in code review and easy to modify.

Here is the essential shape:

goal: Verify user can create a new project
url: https://app.example.com/projects

statements:
- Click the "New Project" button
- Enter "My Test Project" in the project name field
- Click "Create"
- "VERIFY: Project page shows title 'My Test Project'"

The “why” is as important as the “how.” A plain-English first draft is the highest-leverage artifact you can create because it is:

  • shareable across engineering, QA, and product
  • durable as the UI evolves
  • easy to expand into deeper coverage

Then graduate to deterministic replay, without sacrificing resilience

As tests mature, Shiplight supports enriching steps with explicit locators and structured action entities for deterministic replay.

This is where the two-speed strategy becomes real:

  • Natural language steps are powerful, but slower because the agent must interpret the page.
  • Action steps with locators replay quickly, and Shiplight documentation notes they can run around ~1 second per step, versus ~10 to 15 seconds for pure natural language steps.

Crucially, Shiplight positions locators as a cache. If a cached locator becomes stale, the agent can self-heal by using the intent description, and on Shiplight Cloud the cached locator can be updated after a successful self-heal so future runs return to full speed.

That means your suite can be both:

  • fast enough to gate PRs
  • resilient enough to survive weekly UI changes

Make verification smarter with AI-powered assertions

A reliable E2E suite is not just “click this, type that.” It needs strong assertions.

Shiplight supports VERIFY statements as natural-language assertions that are evaluated using AI.

This is especially useful for the messy realities of UI validation where correctness is contextual:

  • dynamic layouts
  • state-dependent UI
  • asynchronous rendering

Instead of exploding your tests into a dozen low-level checks, you can assert outcomes in the same language you use to describe requirements.

Bring the workflow closer to developers: VS Code and Desktop

If your goal is reliable coverage, the authoring loop matters as much as the runtime.

Shiplight supports a VS Code Extension to create, run, and debug *.test.yaml files in an interactive visual debugger, including stepping through statements and modifying steps inline.

For teams that want the full Shiplight experience locally, Shiplight also provides a native macOS Desktop app that runs the browser sandbox and AI agent worker on your machine while loading the Shiplight web UI. It supports bringing your own AI provider keys stored in macOS Keychain.

Operationalize it: CI, schedules, and actionable summaries

A suite only becomes valuable when it runs continuously and produces evidence your team trusts.

Shiplight supports:

  • GitHub Actions integration for running Shiplight test suites in CI, using a Shiplight API token and suite/environment IDs.
  • Schedules (recurring runs via cron expressions) for continuous monitoring outside the PR lifecycle, with reporting on results and performance metrics.
  • AI Test Summary to generate a failure-focused narrative: what broke, likely root cause, and recommendations, so teams can triage faster.

For enterprises, Shiplight also positions itself as SOC 2 Type II certified, with encryption in transit and at rest, role-based access control, and a 99.99% uptime SLA, plus private cloud and VPC deployment options.

The takeaway: design your suite like a system, not a pile of scripts

The most effective E2E programs are not defined by a tool. They are defined by a design:

  • intent as the contract
  • deterministic replay as the optimization
  • self-healing as the maintenance model
  • CI and schedules as the enforcement mechanism

Shiplight AI is purpose-built for that system, whether you start with YAML tests in-repo, scale into Shiplight Cloud, or extend existing Playwright suites with the Shiplight AI SDK.

If you want E2E coverage that keeps up with modern shipping velocity without becoming a maintenance tax, the two-speed approach is the difference between “we have tests” and “we have a release signal.”