Your Browser Has a Remote Control — And Nobody Told You — DeepSeek Blog

# Your Browser Has a Remote Control — And Nobody Told You **3 tools that let AI agents drive Chrome. I tested all three. Here's what actually happens.** --- Every Chrome browser ships with a hidden feature that almost nobody talks about: a remote control API called the **Chrome DevTools Protocol (CDP)**. It's the same protocol that powers Chrome DevTools — the thing you open when you press F12. But here's the part that changes everything: any program can use it. Including AI agents. I've been using CDP daily for over a year to let AI agents browse the web using my real browser — with all my logged-in sessions intact. No passwords shared. No API keys. No OAuth flows. The AI just uses my browser like I would. Three major tools have emerged to give AI agents this superpower. I tested all three on the same task, with the same browser, and discovered something that most tutorials and docs don't tell you. --- ## The Three Contenders | | **Playwright MCP** | **Chrome DevTools Protocol** | **agent-browser** | |---|---|---|---| | **Made by** | Microsoft | Google (built into Chrome) | Vercel Labs | | **GitHub Stars** | 29,000+ | Built-in (no repo needed) | 23,500+ | | **Language** | TypeScript | Any (HTTP + WebSocket) | Rust | | **Latest Version** | v0.0.68 | Ships with Chrome | v0.21.2 | | **Install** | `npx @playwright/mcp` | Already in your browser | `npm i -g agent-browser` | They all use CDP under the hood. But they use it very differently — and that difference matters more than you'd think. --- ## Setting It Up (60 Seconds) Before we compare the tools, let's enable Chrome's remote control. It takes one command. **Start Chrome with remote debugging:** ```bash # macOS "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 # Linux google-chrome --remote-debugging-port=9222 # Windows chrome.exe --remote-debugging-port=9222 ``` **Verify it's working:** ```bash curl http://localhost:9222/json/version ``` ![One curl command confirms Chrome is ready for AI agent control](https://raw.githubusercontent.com/Timtech4u/browser-remote-control/main/screenshots/step1-verify.png) If you see a JSON response with your Chrome version, you're ready. That's the entire setup. > **What just happened?** You told Chrome to listen on port 9222 for remote control commands. Any program on your machine can now send instructions to your browser — open tabs, read pages, click buttons, fill forms, take screenshots. > **Coming soon:** Chrome 146+ is adding a native settings toggle for remote debugging — no command line needed. Go to **DevTools (F12) → Settings → Experiments** and search for "MCP". Once enabled, AI agents can connect without restarting Chrome. This is rolling out gradually in 2025/2026. --- ## Seeing It in Action: agent-browser Let's start with the most beginner-friendly tool. **agent-browser** by Vercel Labs gives you 108+ simple commands to control Chrome from your terminal. ### Install it: ```bash npm install -g agent-browser agent-browser install # Downloads a browser (first time only) ``` ### Fill a form in 4 commands: ![agent-browser fills a pizza order form in 4 simple steps](https://raw.githubusercontent.com/Timtech4u/browser-remote-control/main/screenshots/step2-agent-browser-demo.png) Here's what happens at each step: **Step 1 — Open a page.** Just like clicking a link, but from the command line. **Step 2 — Snapshot.** This is the magic for AI agents. Instead of raw HTML, you get a clean list of interactive elements with reference IDs like `[ref=e2]`. An AI agent reads this and knows exactly what's on the page. **Step 3 — Interact.** Use those ref IDs to fill fields, click buttons, check boxes. `agent-browser fill @e2 "John Doe"` fills the customer name field. Simple. **Step 4 — Screenshot.** Take a picture of the result. With `--annotate`, every interactive element gets a numbered label — perfect for AI vision models. ### Before and after: ![The form before: empty fields with numbered annotations showing each interactive element](https://raw.githubusercontent.com/Timtech4u/browser-remote-control/main/screenshots/form-annotated-empty.png) ![The form after: filled by agent-browser — name, phone, pizza size, toppings all selected](https://raw.githubusercontent.com/Timtech4u/browser-remote-control/main/screenshots/form-annotated-filled.png) The numbered red labels are `--annotate` mode. Each number maps to an element the AI can interact with. This is how vision-based AI agents understand web pages. --- ## The Discovery: Session Sharing Here's the thing nobody tells you. I tested all three tools against the same page — an authenticated dashboard behind a login. Same browser, same URL, same Chrome instance. **Two tools saw the full dashboard.** One saw a login page. ![Same browser, same URL — but different results depending on which tool you use](https://raw.githubusercontent.com/Timtech4u/browser-remote-control/main/screenshots/step3-session-compare.png) ### What's happening? When you connect via **raw CDP** or **agent-browser --cdp 9222**, you're using Chrome's **default browser context**. This means the AI agent sees exactly what you'd see — all your cookies, all your logged-in sessions, everything. When you use **Playwright MCP** or **agent-browser in standalone mode**, they create an **isolated browser context**. Think of it like an incognito window. No cookies, no sessions, no logins. A clean slate. ### The session sharing table: | Tool | Mode | Sees Your Logins? | Why | |---|---|---|---| | **Raw CDP** | Default context | **Yes** | Uses Chrome's real cookie jar | | **agent-browser** | `--cdp 9222` | **Yes** | Connects to Chrome's default context | | **agent-browser** | Standalone | **No** | Launches its own browser | | **Playwright MCP** | Default | **No** | Creates an isolated browserContext | **If you want your AI agent to use your existing logins** — to read your email, check your dashboards, manage your accounts — you need raw CDP or agent-browser connected via `--cdp`. **If you want isolation** — for testing, scraping, or running untrusted automations — Playwright MCP or agent-browser standalone gives you that by default. Neither is "better." They're for different jobs. But most people don't know the difference exists. --- ## How CDP Actually Works (The 2-Minute Version) Chrome's remote control has three layers. That's it. **1. List open tabs** (HTTP GET) ```bash curl http://localhost:9222/json ``` Returns a JSON list of every tab with its title, URL, and WebSocket address. **2. Open a new tab** (HTTP PUT — changed in Chrome 145+) ```bash curl -X PUT "http://localhost:9222/json/new?https://example.com" ``` **3. Send commands** (WebSocket) ```javascript // Connect to a tab's WebSocket URL, then send: { "method": "Page.navigate", "params": { "url": "https://example.com" } } { "method": "Runtime.evaluate", "params": { "expression": "document.title" } } { "method": "Page.captureScreenshot" } ``` That's the entire protocol. HTTP for tab management, WebSocket for commands. You can read, click, type, screenshot, and intercept network requests on any page. --- ## Head-to-Head: Features Compared ### Playwright MCP (Microsoft) **Best for:** AI agents that need structured browser automation with safety guarantees. ```bash # Add to Claude Code, Cursor, or any MCP-compatible AI tool npx @playwright/mcp@latest ``` What it gives you: - **Snapshot mode** — Returns an accessibility tree. AI agents reference elements by `ref` IDs instead of fragile CSS selectors - **Console + network** — Capture console logs and network requests - **Form filling** — Dedicated tools for clicks, fills, selects - **Screenshots** — PNG/JPEG with element-level targeting - **Session isolation** — Each connection gets its own clean context - **Extension mode** — `--extension` flag creates a bridge that CAN share sessions (shipped recently) **29,000+ stars.** Isolated by default (intentional). Very active development. ### agent-browser (Vercel Labs) **Best for:** Fast, native CLI automation. AI agents that need speed and flexibility. ```bash npm install -g agent-browser agent-browser install # Downloads Chrome for Testing (first time) ``` What it gives you: - **108+ commands** — `open`, `click`, `fill`, `snapshot`, `screenshot`, `eval`, `get text`, `find role`, `mouse`, `network`, `har`, and more - **Annotated screenshots** — `--annotate` labels interactive elements with numbered boxes for vision models - **Daemon architecture** — Browser persists between commands, chain with `&&` - **CDP connection** — `--cdp 9222` connects to your real browser with all sessions - **Session persistence** — `--session-name myapp` auto-saves and restores browser state - **Auto-connect** — `--auto-connect` finds your running Chrome automatically - **iOS Simulator support** — Test on iPhone simulators via Appium - **HAR recording** — Capture full HTTP archive of all requests ```bash # Chain commands — browser stays alive between them agent-browser open example.com && \ agent-browser wait --load networkidle && \ agent-browser snapshot -i ``` **Built in Rust.** 78 releases in ~3 months. Created by Malte Ubl (Vercel CTO) and team. **23,500+ stars.** ### Raw CDP (DIY) **Best for:** Maximum control, authenticated workflows, custom integrations. No install needed — just talk to Chrome's HTTP/WebSocket API directly: ```bash # Check what's running curl http://localhost:9222/json/version # List your real open tabs curl http://localhost:9222/json # Open a new tab (preserves all cookies and sessions) curl -X PUT "http://localhost:9222/json/new?https://example.com" ``` **No framework. No dependencies.** Just HTTP requests and WebSocket messages. You can build a full browser automation tool in a few hundred lines of code. This is the lowest-level option — maximum power, maximum flexibility. --- ## Using These Tools with AI Coding Agents These tools really shine when connected to AI coding agents like **Claude Code**, **Cursor**, **OpenCode**, or **Windsurf**. Here's how: ### With Claude Code (MCP) ```json // Add to .claude/mcp.json { "mcpServers": { "playwright": { "command": "npx", "args": ["@playwright/mcp@latest"] } } } ``` Now Claude Code can browse the web, fill forms, take screenshots, and read pages directly. ### With agent-browser (CLI) Any AI agent that can run shell commands can use agent-browser: ```bash # AI agent runs these commands to research a topic agent-browser open "en.wikipedia.org/wiki/Chrome_DevTools" agent-browser snapshot -i # Read the page content agent-browser screenshot # See what it looks like ``` ### With agent-browser --cdp (Authenticated) ```bash # Connect to YOUR Chrome — AI reads pages as you agent-browser --cdp 9222 snapshot -i # Now the AI can see your dashboards, email, authenticated content ``` ### Connecting to Electron Apps Since Electron apps (VS Code, Slack, Discord) are built on Chromium, you can control them too: ```bash # Launch any Electron app with CDP debugging "/Applications/Visual Studio Code.app/Contents/MacOS/Electron" --remote-debugging-port=9333 # Connect agent-browser to VS Code agent-browser --cdp 9333 snapshot ``` --- ## The Landscape Is Moving Fast This isn't a niche topic anymore. The browser automation for AI agents space is exploding: - **browser-use** (78,000 stars) — Originally built on Playwright, switched to raw CDP in 2025 for speed - **chrome-devtools-mcp** (30,000+ stars) — Google's Chrome team released an official MCP server for CDP - **Chrome 146** — Google is adding a native settings toggle for AI agent access via MCP, built right into Chrome The trend is clear: **CDP is becoming the standard interface between AI agents and web browsers.** Google endorses it. Microsoft builds on it. Vercel bets on it. --- ## When to Use What ### Use Playwright MCP when: - You need **test isolation** — each run starts clean - You're building **automated testing pipelines** - You want **structured accessibility snapshots** for AI - Security matters — you don't want the AI accessing your real sessions ### Use agent-browser when: - You want **speed** — native Rust, daemon architecture - You need **rich CLI commands** — 108+ built-in operations - You want **flexibility** — standalone OR connected to your browser - You're working in **cloud sandboxes** with parallel agent sessions - You want the easiest path — just type commands and things happen ### Use raw CDP when: - You need **authenticated sessions** — access your real logins - You want **zero dependencies** — just HTTP and WebSocket - You're building **custom integrations** specific to your workflow - You need to **intercept network requests** or capture auth tokens ### Or combine them: The tools aren't mutually exclusive. You can: 1. Use **raw CDP** or **agent-browser --cdp** for authenticated workflows (read your email, manage dashboards) 2. Use **agent-browser standalone** for fast scripted automation (fill forms, scrape data) 3. Use **Playwright MCP** for isolated testing (run tests in clean contexts) All talking to the same Chrome, via the same protocol. --- ## Get Started Now **Step 1:** Start Chrome with debugging enabled: ```bash "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 ``` **Step 2:** Verify it works: ```bash curl http://localhost:9222/json/version ``` **Step 3:** Pick your tool and try it: ```bash # Option A: Playwright MCP (isolated, structured) npx @playwright/mcp@latest --cdp-endpoint http://localhost:9222 # Option B: agent-browser (fast, flexible) npm i -g agent-browser agent-browser open example.com && agent-browser snapshot -i # Option C: agent-browser connected to YOUR browser (authenticated) agent-browser --cdp 9222 snapshot -i # Option D: Raw CDP (minimal, no dependencies) curl http://localhost:9222/json # List your real tabs ``` You now have AI-ready browser control. Your logged-in sessions, your tabs, your data — all accessible to AI agents through the protocol that was hiding in plain sight. --- ## The Bottom Line Every Chrome browser ships with a remote control. Three major tools let AI agents use it. They all speak the same protocol, but they make fundamentally different choices about **session isolation** — and that one choice determines whether your AI agent sees a login page or your actual dashboard. Now you know the difference. Build accordingly. --- ## Give Your AI Agent the Instructions Want your AI agent to already know how to use these tools? Each project publishes **skills** — ready-made instruction files you can pass to Claude Code, Cursor, OpenCode, or any AI coding agent. ### agent-browser skills ```bash npx skills add github:vercel-labs/agent-browser ``` [5 skills available](https://github.com/vercel-labs/agent-browser/tree/main/skills): general browser automation, QA/dogfood testing, **Electron app control** (VS Code, Slack, Discord, Figma), Slack workspace automation, and Vercel Sandbox cloud sessions. The **Electron skill** is especially powerful — it teaches your AI agent how to launch and control desktop apps like VS Code, Slack, or Figma through CDP. Your AI agent can read Slack messages, navigate VS Code, or interact with any Chromium-based desktop app. ### chrome-devtools-mcp skills ```bash npx skills add github:anthropics/anthropic-cookbook chrome-devtools-mcp ``` [5 skills available](https://github.com/anthropics/anthropic-cookbook/tree/main/anthropic-skills/chrome-devtools-mcp/skills): core Chrome DevTools automation, CLI scripting, accessibility debugging, LCP performance optimization, and connection troubleshooting. ### Raw CDP skill ```bash npx skills add github:anthropics/anthropic-cookbook chrome-cdp ``` [Skill available](https://github.com/anthropics/anthropic-cookbook/tree/main/anthropic-skills/chrome-cdp/skills): Lightweight CDP CLI for live Chrome session control — connects to your real tabs with all cookies preserved. 13 commands, per-tab daemon architecture. > **What are skills?** Skills are instruction files that teach AI agents how to use specific tools. Instead of explaining everything yourself, you install a skill and your agent instantly knows the commands, best practices, and common patterns. Think of them like a manual the AI reads before it starts working. --- *Find me at [timtech4u.dev](https://timtech4u.dev) or [@timtech4u](https://x.com/timtech4u).*

Your Browser Has a Remote Control — And Nobody Told You

Tags

Comments

More Blog

How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

Local AI Will Save Us All (The Math Says So, Trust Me)

Lost in the AI Hype, I Started Small

Building a Replay-Tested Interactive Brokers Client in Go

Playwright in Pictures: Fully Parallel Mode

Designing a CLI for Both Humans and Agents