How to Add Automated Testing to Cursor, Copilot, and Codex

AI coding tools write code faster than any human. But faster code without testing is just faster bugs. If you're using Cursor, GitHub Copilot, or Codex to generate code, you've probably noticed the pattern: the AI writes something that looks correct, you ship it, and then something breaks in production that a quick E2E test would have caught. The problem isn't the AI. The problem is that **most AI coding workflows have no verification step**. The agent writes code, you review it visually, and you merge. There's no automated check that the UI actually works as intended. This guide shows you how to close that gap by adding automated QA testing directly into your AI coding workflow — regardless of which tool you use. ## Why AI-Generated Code Needs Testing More Than Human Code Human developers build mental models as they code. They know which edge cases matter because they've seen them break before. AI coding tools don't have that context — they generate statistically likely code, not battle-tested code. The data backs this up: - AI-generated code introduces subtle bugs in authentication flows, state management, and error handling — areas where context matters most - Teams shipping AI-generated code without QA testing report higher rates of production incidents in their first 90 days - The most common failures are **visual and behavioral** — the code compiles, the types check, but the UI doesn't work as expected Unit tests catch type errors and logic bugs. But they can't tell you whether the login flow actually works in a browser, whether the checkout page renders correctly, or whether the navigation breaks on mobile. That requires end-to-end testing — and it's exactly what's missing from most AI coding workflows. ## The Missing Piece: MCP (Model Context Protocol) MCP is an open standard that lets AI coding agents connect to external tools. Think of it as USB for AI — a universal protocol that lets your coding agent talk to browsers, databases, APIs, and testing platforms. Without MCP, your AI coding tool operates in a bubble. It can read and write code, but it can't: - Open a browser and see what the UI actually looks like - Click through a user flow to verify it works - Run existing test suites and interpret the results - Generate new tests based on the changes it just made With MCP, the agent gains **eyes and hands**. It can open your app in a real browser, navigate through flows, verify that UI changes look correct, and capture that verification as a reusable test. ## How It Works: The AI-Native Testing Loop The testing loop is the same regardless of which coding tool you use: 1. **You describe what you want** — "Add a settings page with dark mode toggle" 2. **The AI writes the code** — Components, styles, state management 3. **The agent opens a browser** — Navigates to your running app via MCP 4. **The agent verifies the change** — Checks that the settings page exists, the toggle works, dark mode activates 5. **The verification becomes a test** — Saved as a YAML file in your repo 6. **Tests run in CI/CD** — Every future PR runs the same verification automatically The key insight: **steps 3–5 happen automatically**. The agent doesn't just write code — it proves the code works, then turns that proof into a permanent regression test. ## Setting Up in Claude Code Claude Code has the deepest integration with Shiplight. The plugin installs MCP tools and three built-in skills in a single command. ```bash claude plugin marketplace add ShiplightAI/claude-code-plugin && claude plugin install mcp-plugin@shiplight-plugins ``` This gives your agent browser automation MCP tools plus three skills: - **`/verify`** — Open a browser to inspect pages and validate UI changes - **`/create_e2e_tests`** — Scaffold a test project and write YAML tests by walking through your app in a real browser - **`/cloud`** — Sync local tests to Shiplight Cloud for scheduled execution and team collaboration After your coding agent implements a frontend change, use `/verify` to confirm it works: ```plaintext Update the navbar to include "Pricing" and "Blog" links, then use /verify to confirm they appear correctly on localhost:3000. ``` ## Setting Up in Cursor Install the plugin the same way, then open Agent mode (Cmd+L → Agent) and prompt: ```plaintext I just changed the login page. Open the app at localhost:3000/login, try logging in with [email protected] / password123, and verify the dashboard loads correctly. Save a YAML test for this flow. ``` The agent will launch a real browser, navigate to the login page, fill in credentials, verify the dashboard appears, and save a YAML test file like `tests/login-flow.yaml`. **Tips:** - Use **Agent mode**, not Ask mode — Agent mode can execute multi-step MCP tool calls - Keep your **dev server running** — the agent needs a live URL to test against - **Review the generated YAML** — it's human-readable, so you can tweak assertions before committing ## Setting Up in Codex OpenAI's Codex CLI is a terminal-based agent, similar to Claude Code. After installing the plugin: ```plaintext Open localhost:3000 in a browser and verify the homepage loads correctly. Check that the navigation works and the hero section displays the right content. Save a test. ``` ## Setting Up in VS Code (Copilot) Open Copilot Chat (Ctrl+Shift+I), switch to **Agent mode**, and prompt: ```plaintext Verify that the signup form at localhost:3000/signup works. Fill in a test user, submit, and confirm the success message appears. ``` Agent mode is required — standard Copilot completions can't use MCP tools. ## What the Agent Actually Tests Once connected via MCP, your AI coding agent can: | Capability | What It Does | Example | |-----------|-------------|----------| | **Navigate** | Open any URL in a real browser | Go to `localhost:3000/settings` | | **Interact** | Click buttons, fill forms, scroll | Submit the contact form | | **Verify** | Check that elements exist and look correct | Confirm the success toast appears | | **Assert** | Validate specific conditions | Confirm the price shows "$49/mo" | | **Generate tests** | Save verification as YAML test file | Create `tests/settings-page.yaml` | | **Run tests** | Execute existing test suites | Run all tests in `tests/` folder | The generated YAML tests are human-readable and live in your repo: ```yaml goal: Verify settings page dark mode toggle base_url: http://localhost:3000 statements: - navigate: /settings - VERIFY: Settings page heading is visible - intent: Toggle dark mode switch action: click locator: "getByRole('switch', { name: 'Dark mode' })" - VERIFY: Page background changes to dark theme - VERIFY: Toggle shows enabled state ``` Anyone on the team — engineers, QA, PMs — can read these tests and understand what they check. No Playwright or Cypress expertise required. ## Running Tests in CI/CD Run generated tests locally: ```bash npx shiplight test ``` Add them to GitHub Actions so every PR gets verified: ```yaml # .github/workflows/e2e.yml name: E2E Tests on: [pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 - run: npm ci - run: npm run build && npm start & - run: npx shiplight test --project ./tests ``` When the UI changes, intent-based steps self-heal automatically — you don't need to update locators manually. ## Common Patterns **Pattern 1: Write and Verify** (most common) ```plaintext 1. Ask AI to implement a feature 2. Ask AI to verify it works in the browser 3. Ask AI to save the verification as a test 4. Commit code + test together ``` **Pattern 2: Test-First with AI** ```plaintext 1. Write YAML test spec describing desired behavior 2. Ask AI to implement code that passes the spec 3. Run the test to confirm 4. Iterate until green ``` **Pattern 3: Review and Harden** ```plaintext 1. AI writes code 2. Before merging, ask AI to review the change in the browser 3. AI runs visual and behavioral checks 4. AI generates regression tests for anything it finds ``` ## FAQ **Do I need to know Playwright or Cypress?** No. Tests are saved as YAML files with natural language statements. The YAML runs on Playwright under the hood, but you never write Playwright code. **Can I test against localhost?** Yes. MCP-based testing runs a real browser on your machine. It connects to whatever URL you specify — localhost, staging, or production. **What happens when the UI changes?** YAML tests use intent-based steps (e.g., "Click the submit button") rather than brittle CSS selectors. When the UI changes, the agent re-resolves the intent automatically. **Do I need a Shiplight account?** No. Browser automation and local testing work without an account. You only need an API token for cloud features like scheduled runs and team dashboards. --- *[Shiplight Quick Start guide](https://docs.shiplight.ai/getting-started/quick-start.html) · [YAML Test Language Spec](https://github.com/ShiplightAI/examples/blob/main/yaml-examples/YAML-TEST-LANGUAGE-SPEC.md) · [Shiplight Plugins overview](https://www.shiplight.ai/plugins)*

How to Add Automated Testing to Cursor, Copilot, and Codex

Tags

Comments

More Blog

Cursor vs Claude Code in 2026: Which AI Coding Tool Actually Makes You Faster?

The 5 MCPs that actually changed how I use Cursor and Claude Code

AI-Powered Development 2026: Beyond Basic Code Generation

Cursor AI vs GitHub Copilot: Developer Comparison 2025

How to Build 3D & AR Apps with AI — Cursor, Windsurf, Claude Code

AI Coding Market Share 2026: Who's Winning?