E2E Testing Beyond Clicks: How to Validate Real User Journeys (UI, Auth, and Email) with Shiplight AI
January 1, 1970
January 1, 1970
End-to-end testing rarely fails because a team forgot to “click the login button.” It fails because modern user journeys are not single-page interactions. They span authentication, dynamic UI states, background jobs, third-party systems, and yes, the inbox.
That complexity is exactly why many teams end up with one of two outcomes: a brittle E2E suite that requires constant babysitting, or a lightweight smoke suite that does not earn the right to gate releases.
Shiplight AI is built for the gap between those extremes. It is an agentic QA platform that generates and runs end-to-end tests expressed in natural language intent, with mechanisms to keep them reliable as the application changes. Under the hood, Shiplight runs on Playwright, with an AI-native layer on top to reduce brittleness and maintenance.
Below is a practical, workflow-first blueprint for testing the hard parts: authenticated flows, data-dependent steps, and email-based verification.
Shiplight tests can be authored in YAML using natural language steps, while still running as Playwright tests with an AI “web agent” to interpret intent at runtime. That combination matters because it keeps tests readable for humans and reviewable in the repo, while staying executable in real browsers.
A minimal structure looks like this:
goal: Verify user can create a new project
url: https://app.example.com/projects
statements:
- Click the "New Project" button
- Enter "My Test Project" in the project name field
- Click "Create"
- "VERIFY: Project page shows title 'My Test Project'"
teardown:
- Delete the created project
As tests evolve, Shiplight can enrich steps with deterministic locators for fast replay. Crucially, those locators are treated as a cache, not a hard dependency. When a cached locator becomes stale, the agentic layer can fall back to the natural language description to recover.
For teams that want to start locally, Shiplight’s local testing flow is explicitly designed to plug into standard Playwright conventions. The docs outline prerequisites like Node.js (>= 22) and an AI API key (Anthropic or Google) for AI-powered actions.
The fastest way to create flaky E2E tests is to hardcode data, duplicate setup steps everywhere, and hope state is consistent across runs. Shiplight gives you several composable primitives to avoid that trap.
Shiplight supports variables for both configured values and runtime-discovered values, including guidance on how to reference variables in natural language depending on whether you want substitution at generation time or runtime.
That matters for common patterns like:
Templates allow you to define common workflows once and reuse them across tests. When the template changes, tests using it are updated automatically.
Hooks extend that idea by running template-based setup and cleanup automatically as “before test” and “after test” behaviors, with clear execution order and override options at the schedule or suite level.
Some operations are better expressed as code: calling internal APIs, shaping test data, or handling complex authentication. Shiplight Functions are designed for that, with a signature that includes page, testContext, and request, enabling browser actions and API calls while storing results back into test context.
If your app uses email for onboarding, password resets, OTP codes, or magic links, you already know the pain: the UI test is fine until you have to fetch and parse the email.
Shiplight’s Email Content Extraction feature is built for exactly this. It lets automated tests read incoming emails and extract content like verification codes or activation links, using an LLM-based extractor so teams do not have to rely on regex-heavy parsing.
The workflow includes creating a forward email configuration with an auto-generated forwarding address (for example, xxxx@forward.shiplight.ai), then adding an EXTRACT_EMAIL_CONTENT step to the test.
Extraction types include:
email_otp_code)email_magic_link)email_extracted_content)From there, you can reference the extracted variables directly in later steps (for example, navigating to email_magic_link).
This is one of the highest-leverage upgrades you can make to E2E coverage because it turns “manual inbox checks” into a repeatable, automated contract.
Good tests are only half the system. The other half is how your team runs them, interprets failures, and routes results back into shipping decisions.
Shiplight suites bundle test cases for convenience and tracking, including metrics and latest run results.
Schedules (internally called “Test Plan”) support running tests automatically at regular intervals, including cron-based configuration and reporting on pass rates and performance metrics.
Shiplight provides a GitHub Actions integration guide, including the ShiplightAI/github-action@v1 action, API token handling, and environment configuration.
The docs also show patterns for running multiple suites and wiring Shiplight results into pull request workflows.
For failed tests, Shiplight AI Summaries are generated when you view test details, then cached for subsequent views. They include root cause analysis, expected versus actual behavior, relevant context, and recommendations.
If you want to route signals to internal systems, Shiplight supports webhook endpoints configured in global settings, with signature verification guidance and send conditions like “Failed” and “Pass→Fail.”
For teams in regulated environments or with strict security standards, Shiplight positions its enterprise offering around SOC 2 Type II certification, encryption in transit and at rest, role-based access control, immutable audit logs, and a 99.99% uptime SLA.
The next generation of E2E testing is not just “AI writes tests.” It is a system where:
If you want to see what this looks like in your own application, Shiplight can get teams started quickly, including a path that begins with just an application URL and a test account.