Engineering

From Human-First to Agent-First Testing: What a Year of Building Taught Us

Feng

Updated on May 15, 2026

Shiplight Cloud is a fully-managed, cloud-based natural language testing platform designed to multiply human productivity. Teams author tests visually, the platform handles execution, and results are managed in the cloud. It continues to serve teams that need managed test authoring and execution.

By late 2025, the landscape around us shifted in ways that called for a different product:

AI coding agents took off. They generate testing scripts fast, but the output is hard to review and expensive to maintain. The volume of tests grows, but confidence does not.
Roles are collapsing. The PM → engineer → QA handoff is dissolving. A single person increasingly defines, builds, and verifies with AI. Quality is no longer a separate phase.
Specs are becoming the source of truth. With AI generating code from intent, the canonical representation of product behavior moves upstream from code to structured natural language.

In addition to Shiplight Cloud, we built Shiplight Plugins as a new product for developers and automation engineers who work with AI agents. The core principle: AI handles test creation, execution, and maintenance, while the system produces clear evidence at every step for humans to understand and trust.

Design Goals

Tight feedback loop for AI agents. AI coding agents produce better results when they get clear, immediate feedback. Verification should happen during development, not after.
Spec-driven. Tests should read like product specs, not implementation code. Anyone on the team can review what is being tested without technical expertise.
Auto-healing. Cosmetic and structural UI changes should not break tests as long as the product behavior is unchanged.
Human-readable evidence. When tests pass or fail, the result should be understandable by anyone on the team without reading code or stack traces.
Performant. Tests should be fast and repeatable by default. Deterministic replay where possible, AI resolution only when needed.
No new platform to learn. Extend the tools and workflows developers already use rather than introducing a new system to adopt.

How Shiplight Plugins Works

Here's how this comes together in practice.

Shiplight Browser MCP Server

Any MCP-compatible coding agent connects to the Shiplight browser MCP server, gaining the ability to open a browser, navigate the app, interact with elements, take screenshots, and observe network activity.

It goes beyond launching a fresh browser: attach to an existing Chrome DevTools URL to test against a running dev environment with real data and authenticated state. A relay server supports remote and headless setups.

The AI agent navigates the application as a human would, producing a structured test as output.

Tests Are Natural Language, Not Code

We designed Shiplight tests around natural language in YAML format to solve the readability and maintenance problems with AI-generated Playwright scripts:

goal: Verify that a user can log in and create a new project
base_url: https://your-app.com
statements:
  - URL: /login
  - intent: Enter email address
    action: input_text
    locator: "getByPlaceholder('Email')"
    text: "{{TEST_EMAIL}}"
  - intent: Enter the password
    action: input_text
    locator: "getByPlaceholder('Password')"
    text: "{{TEST_PASSWORD}}"
  - intent: Click Sign In
    action: click
    locator: "getByRole('button', { name: 'Sign In' })"
  - VERIFY: The dashboard is visible with a welcome message
  - intent: Click "New Project" in the sidebar
    action: click
    locator: "getByRole('link', { name: 'New Project' })"
  - VERIFY: The project creation form is displayed

Each test describes the flow in human terms, following web testing best practices that emphasize clarity and maintainability. The same person who specified the feature can review the test without understanding test code. Files live in the repo, are reviewed in PRs, and produce clean diffs. Intent-based steps resolve via AI at runtime or use cached locators for deterministic replay. Custom logic (API calls, database queries, setup) embeds inline as JavaScript.

Run, Debug, and Get Reports with the CLI

shiplight test runs tests locally. shiplight debug opens an interactive debugger to step through tests one statement at a time, inspect browser state, and edit steps in place.

After a run, Shiplight generates an HTML report. We retained the best of Playwright (video recording, trace data) and addressed what was lacking. Instead of cryptic selectors and programmatic steps, reports show natural language steps paired with screenshots.

On failure: a screenshot of the actual page state, the expected behavior, and an AI-generated explanation. For example, "Expected a welcome message, but the page displays 'Session Expired'." Readable by anyone on the team without code context.

Drop Into Your Existing Workflow

Tests are YAML files in the repo. The CLI runs anywhere Node.js runs. GitHub Actions, GitLab CI, CircleCI require minimal configuration: add a step and point it at the test directory.

Shiplight Cloud features (scheduled runs, team dashboards, historical trends, hosted reports) are available when needed. But the core loop works entirely with the CLI and existing CI. No lock-in.

What's Next

A year ago we built a platform to help humans test more productively. Now we are building for a world where one person, operating AI, designs, builds, and verifies a feature in a single session.

The role of testing is not disappearing — it is shifting. The tooling needs to reflect that: verification integrated into the development flow, evidence clear enough to trust without re-doing the work, and tests that maintain themselves as the product evolves.

We are building Shiplight to be that layer.

Key Takeaways

Verify in a real browser during development. Shiplight's MCP server lets AI coding agents open a browser and validate UI changes before code review — not after deployment.
Generate stable regression tests automatically. Verifications become YAML test files in your repo, building regression coverage as a byproduct of development.
Reduce maintenance with AI-driven self-healing. Intent-based test steps adapt to UI changes automatically. Cached locators keep execution fast; AI resolves only when needed.
Enterprise-ready security and deployment. SOC 2 Type II certified, encrypted data, role-based access, immutable audit logs, and a 99.99% uptime SLA.