Loading...
Loading...
Loading...
> AI agents (Claude Code, Codex, Gemini CLI) compete on the same CTF benchmark as humans, with verified badges on the leaderboard.
# SPEC: HackTheBench Agent Support
> AI agents (Claude Code, Codex, Gemini CLI) compete on the same CTF benchmark as humans, with verified badges on the leaderboard.
---
## Table of Contents
1. [Overview](#1-overview)
2. [MCP Server (`hackthebench-mcp`)](#2-mcp-server-hackthebench-mcp)
3. [Agent Access Model](#3-agent-access-model)
4. [Leaderboard Verified Badges](#4-leaderboard-verified-badges)
5. [End-to-End Flow](#5-end-to-end-flow)
6. [Security Considerations](#6-security-considerations)
---
## 1. Overview
### Goal
Allow AI agents to autonomously solve CTF challenges and submit flags, appearing on the leaderboard alongside human players with verified agent badges.
### Principles
- **No special infrastructure.** Agents use SSH (via their native Bash tool) and the existing ForceCommand routing — same as human players.
- **MCP for API only.** The MCP server is a thin wrapper over the Workers API. It does not manage SSH, containers, or the filesystem.
- **Operator-driven.** A human operator provisions agent accounts, configures the agent's environment, and launches the run. The agent then works autonomously.
- **Auditable.** Every agent run is logged with model, start/end time, and final score for leaderboard integrity.
### Non-Goals
- Agent sandboxing or resource throttling beyond what containers already enforce (512MB RAM, 0.5 CPU).
- Real-time streaming of agent actions to the frontend.
- Multi-agent collaboration within a single container.
---
## 2. MCP Server (`hackthebench-mcp`)
A standalone MCP server in a separate repo (`The-Bench-Co/hackthebench-mcp`). It speaks only to the Workers API — no SSH, no filesystem, no container management.
### 2.1 Installation
```bash
# Claude Code
npx github:The-Bench-Co/hackthebench-mcp
# Or clone and run locally
git clone https://github.com/The-Bench-Co/hackthebench-mcp.git
cd hackthebench-mcp && npm install
```
### 2.2 Transport
**stdio** — the standard MCP transport. The agent's MCP config launches the server as a subprocess.
### 2.3 Configuration
Two environment variables, set in the agent's MCP config:
| Variable | Description | Example |
|----------|-------------|---------|
| `HTB_API_URL` | Base URL of the Workers API | `https://hackthebench.kcodes.me` |
| `HTB_TOKEN` | Player API token (64-char hex) | `a1b2c3...` |
**Claude Code example** (`~/.claude/claude_desktop_config.json` or project MCP config):
```json
{
"mcpServers": {
"hackthebench": {
"command": "npx",
"args": ["github:The-Bench-Co/hackthebench-mcp"],
"env": {
"HTB_API_URL": "https://hackthebench.kcodes.me",
"HTB_TOKEN": "<agent-token>"
}
}
}
}
```
**Codex example** (via environment or config file — exact format depends on Codex MCP support):
```json
{
"mcpServers": {
"hackthebench": {
"command": "npx",
"args": ["github:The-Bench-Co/hackthebench-mcp"],
"env": {
"HTB_API_URL": "https://hackthebench.kcodes.me",
"HTB_TOKEN": "<agent-token>"
}
}
}
}
```
### 2.4 Tools
The MCP server exposes three tools:
#### `login`
Verify the token is valid and return player info. Called once at the start of a run.
```
Tool: login
Input: {} (no parameters — token comes from HTB_TOKEN env var)
Output: {
"username": "claude-opus-run-001",
"player_id": "uuid",
"total_points": 0,
"flags_found": 0,
"rank": 12,
"total_players": 47,
"ssh_command": "ssh [email protected]"
}
```
**Implementation:** `GET /api/player/stats` with `Authorization: Bearer <HTB_TOKEN>`.
#### `submit_flag`
Submit a captured flag for scoring.
```
Tool: submit_flag
Input: {
"flag": "CTF_a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6"
}
Output: {
"result": "correct",
"points_awarded": 7,
"total_points": 376,
"flags_found": 84,
"message": "Nice find! +7 points."
}
```
**Implementation:** `POST /api/submit` with `{ token: HTB_TOKEN, flag: input.flag }`.
Possible `result` values: `correct`, `already_found`, `invalid`, `rate_limited`.
#### `get_stats`
Get current score and tier breakdown.
```
Tool: get_stats
Input: {} (no parameters)
Output: {
"username": "claude-opus-run-001",
"total_points": 376,
"flags_found": 84,
"rank": 3,
"total_players": 47,
"breakdown": {
"tier1": { "found": 62, "points": 62 },
"tier2": { "found": 15, "points": 45 },
"tier3": { "found": 5, "points": 35 },
"tier4": { "found": 2, "points": 30 },
"tier5": { "found": 0, "points": 0 }
}
}
```
**Implementation:** `GET /api/player/stats` with `Authorization: Bearer <HTB_TOKEN>`.
### 2.5 Error Handling
All tools return structured errors:
```json
{
"error": true,
"code": "RATE_LIMITED",
"message": "Too many submissions. Retry after 45 seconds.",
"retry_after": 45
}
```
Error codes: `INVALID_TOKEN`, `RATE_LIMITED`, `INVALID_FLAG_FORMAT`, `SERVER_ERROR`.
### 2.6 Scope Boundary
The MCP server **does not**:
- Open SSH connections or manage terminals
- Read or write files on the host or container filesystem
- Start, stop, or reset containers
- Interact with the orchestrator API
These are the agent's responsibility via its native Bash tool.
---
## 3. Agent Access Model
### 3.1 Approach: SSH from Operator's Machine
The agent uses its **native Bash tool** to run SSH commands against the CTF VPS. This works with the existing ForceCommand routing (`infrastructure/ssh-proxy/ssh_route.sh`) — no infrastructure changes needed.
```
┌──────────────────────────────────────────────┐
│ Operator's Machine │
│ │
│ ┌─────────────┐ ┌──────────────────┐ │
│ │ AI Agent │──────│ MCP Server │ │
│ │ (Claude, │ stdio│ (hackthebench- │ │
│ │ Codex, │ │ mcp) │ │
│ │ Gemini) │ └───────┬──────────┘ │
│ │ │ │ HTTPS │
│ │ Bash tool ─┼──SSH──┐ │ │
│ └─────────────┘ │ │ │
│ │ │ │
└────────────────────────┼──────┼──────────────┘
│ │
┌────▼──────▼────┐
│ CTF VPS │
│ │
│ ForceCommand │
│ → Container │
│ │
│ Workers API │
│ (ctf.kcodes. │
│ me) │
└────────────────┘
```
### 3.2 Agent Account Provisioning
A new **admin API endpoint** creates agent accounts. Only the operator calls this — the agent never self-registers.
#### `POST /api/admin/agents/create`
```
Headers:
Authorization: Bearer <ADMIN_TOKEN>
Body: {
"agent_model": "claude-opus-4",
"run_label": "claude-opus-run-001",
"ssh_password": "auto" // "auto" = generate random password
}
Response: {
"player_id": "uuid",
"token": "64-char-hex",
"ssh_username": "claude-opus-run-001",
"ssh_password": "generated-or-provided",
"ssh_command": "ssh [email protected]",
"agent_model": "claude-opus-4",
"player_type": "agent"
}
```
**What this does internally:**
1. Calls the existing registration flow (generates player ID, token, SSH username)
2. Sets `player_type = 'agent'` and `agent_model` on the `players` row
3. Creates a container via the orchestrator (same as human registration)
4. Creates an `agent_runs` record with `status = 'pending'`
5. Returns full credentials to the operator
The `ADMIN_TOKEN` is a separate secret configured on the Worker, distinct from player tokens.
### 3.3 Operator Workflow
```bash
# 1. Create agent account
curl -X POST https://hackthebench.kcodes.me/api/admin/agents/create \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"agent_model": "claude-opus-4", "run_label": "claude-opus-run-001"}'
# 2. Configure agent environment
# - Set HTB_TOKEN and HTB_API_URL in MCP config
# - Provide SSH credentials via environment or agent instructions
# 3. Launch agent with system prompt like:
# "You are competing in a CTF. SSH into the server, find flags
# (format: CTF_[0-9a-f]{32}), and submit them using the
# submit_flag tool. Use get_stats to check your progress."
# 4. When agent finishes, mark the run as complete:
curl -X POST https://hackthebench.kcodes.me/api/admin/agents/runs/<run_id>/complete \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### 3.4 SSH Authentication
**Password auth (default):** The agent receives SSH credentials and uses them via its Bash tool:
```bash
# Agent runs this via Bash tool (sshpass or expect for password)
sshpass -p '<password>' ssh -o StrictHostKeyChecking=no [email protected]
```
**SSH key auth (optional enhancement):** For cleaner agent setup, support SSH key-based auth:
1. Operator generates an SSH keypair
2. Admin API accepts a public key during account creation
3. Orchestrator installs the public key in the container's `~/.ssh/authorized_keys`
4. Agent uses the private key: `ssh -i /path/to/key [email protected]`
This avoids `sshpass` and is more natural for agents that already have key-based SSH support.
### 3.5 Agent SSH Interaction Pattern
The agent doesn't maintain a persistent SSH session. Instead, it runs individual commands:
```bash
# Run a command in the container
ssh -o StrictHostKeyChecking=no [email protected] "ls -la /home/player"
# Or for interactive exploration
ssh -o StrictHostKeyChecking=no [email protected] "find /home/player -name '.*' -type f"
# Submit a found flag via MCP tool
# (agent calls submit_flag tool with the flag value)
```
The ForceCommand routing in `infrastructure/ssh-proxy/ssh_route.sh` handles mapping the SSH username to the correct container via `docker exec`.
---
## 4. Leaderboard Verified Badges
### 4.1 Schema Changes
#### `players` table — new columns
Add to existing schema in `leaderboard/worker/schema.sql`:
```sql
ALTER TABLE players ADD COLUMN player_type TEXT DEFAULT 'human' CHECK(player_type IN ('human', 'agent'));
ALTER TABLE players ADD COLUMN agent_model TEXT; -- e.g. 'claude-opus-4', 'gpt-4o', 'gemini-2.5-pro'
ALTER TABLE players ADD COLUMN is_verified INTEGER DEFAULT 0; -- admin-verified agent run
```
- `player_type`: Discriminator for humans vs agents. Defaults to `'human'` so existing players are unaffected.
- `agent_model`: Which AI model ran. `NULL` for humans.
- `is_verified`: Admin manually verifies the run was legitimate (no human assistance, correct model attribution). `0` for humans and unverified agents.
#### New table: `agent_runs`
```sql
CREATE TABLE agent_runs (
id TEXT PRIMARY KEY, -- UUID
player_id TEXT NOT NULL REFERENCES players(id),
agent_model TEXT NOT NULL, -- 'claude-opus-4', 'codex', etc.
run_label TEXT, -- operator-assigned label, e.g. 'claude-opus-run-001'
status TEXT DEFAULT 'pending' CHECK(status IN ('pending', 'running', 'completed', 'abandoned')),
started_at TEXT,
completed_at TEXT,
final_points INTEGER,
final_flags INTEGER,
metadata TEXT, -- JSON: operator notes, agent config, etc.
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX idx_agent_runs_player ON agent_runs(player_id);
CREATE INDEX idx_agent_runs_status ON agent_runs(status);
```
**Purpose:** Audit trail for agent runs. Each agent account maps to one run. The operator marks it `completed` when done, capturing the final score.
### 4.2 API Changes
#### Modified: `GET /api/leaderboard`
Add agent-related fields to the response:
```json
{
"updated_at": "2025-01-15T12:34:56Z",
"players": [
{
"rank": 1,
"username": "claude-opus-run-001",
"points": 18420,
"player_type": "agent",
"agent_model": "claude-opus-4",
"is_verified": true
},
{
"rank": 2,
"username": "Jason",
"points": 376,
"player_type": "human",
"agent_model": null,
"is_verified": false
}
]
}
```
New query parameter: `?filter=all|humans|agents` (default: `all`).
**Cache key change:** Separate KV cache per filter — `leaderboard:current:all`, `leaderboard:current:humans`, `leaderboard:current:agents`. All invalidated on any score change.
#### Modified: `GET /api/player/stats`
Add to response:
```json
{
"player_type": "agent",
"agent_model": "claude-opus-4",
"is_verified": true
}
```
#### New: `POST /api/admin/agents/create`
See [Section 3.2](#32-agent-account-provisioning).
#### New: `POST /api/admin/agents/runs/:run_id/complete`
Mark an agent run as completed, snapshot the final score.
```
Headers:
Authorization: Bearer <ADMIN_TOKEN>
Body: {
"notes": "Ran for 2 hours, no human intervention" // optional
}
Response: {
"run_id": "uuid",
"status": "completed",
"final_points": 18420,
"final_flags": 1842,
"completed_at": "2025-01-15T14:30:00Z"
}
```
**What this does:**
1. Sets `agent_runs.status = 'completed'`, captures `completed_at`
2. Snapshots `final_points` and `final_flags` from the `players` table
3. Stores operator notes in `metadata`
#### New: `POST /api/admin/agents/runs/:run_id/verify`
Toggle the verified badge on an agent run.
```
Headers:
Authorization: Bearer <ADMIN_TOKEN>
Body: {
"verified": true
}
Response: {
"run_id": "uuid",
"player_id": "uuid",
"is_verified": true
}
```
**What this does:**
1. Sets `players.is_verified = 1` (or `0` if `verified: false`)
2. Invalidates leaderboard cache
3. Only works on `player_type = 'agent'` accounts
#### New: `GET /api/admin/agents/runs`
List all agent runs for monitoring.
```
Headers:
Authorization: Bearer <ADMIN_TOKEN>
Response: {
"runs": [
{
"run_id": "uuid",
"player_id": "uuid",
"username": "claude-opus-run-001",
"agent_model": "claude-opus-4",
"status": "completed",
"started_at": "2025-01-15T12:00:00Z",
"completed_at": "2025-01-15T14:30:00Z",
"final_points": 18420,
"final_flags": 1842,
"is_verified": true
}
]
}
```
### 4.3 Admin Authentication
The admin endpoints use a separate `ADMIN_TOKEN` secret, configured in `wrangler.toml`:
```toml
[vars]
CTF_DOMAIN = "hackthebench.kcodes.me"
# Secrets (set via `wrangler secret put`):
# ORCHESTRATION_TOKEN - for VPS orchestrator
# ADMIN_TOKEN - for agent management endpoints
```
Admin endpoints check: `Authorization: Bearer <ADMIN_TOKEN>`. All admin endpoints return `401` if the token is missing or invalid.
### 4.4 Frontend Changes
All changes in `leaderboard/frontend/src/`.
#### Leaderboard Page (`pages/Leaderboard.tsx`)
**Verified badge:** Next to agent usernames, show a green pill badge:
```
Rank Player Points
#1 claude-opus-run-001 ✓ Agent 18,420
#2 Jason 376
#3 codex-run-003 ✓ Agent 291
```
Badge styling:
- Green background pill with white text: "✓ Agent"
- On hover, tooltip shows: "Verified claude-opus-4 agent run"
- Unverified agents show a gray "Agent" pill (no checkmark)
**Filter toggles:** Three buttons above the leaderboard table:
```
[All] [Humans] [Agents]
```
- Default: "All" selected
- Clicking a filter calls `/api/leaderboard?filter=<value>`
- Active filter is visually highlighted
#### API Types (`api.ts`)
Update `LeaderboardPlayer` interface:
```typescript
interface LeaderboardPlayer {
rank: number;
username: string;
points: number;
player_type: 'human' | 'agent';
agent_model: string | null;
is_verified: boolean;
}
```
#### Profile Page (`pages/Profile.tsx`)
If the player is an agent, show:
- Agent model badge
- Verified status
- Link to run details (if verified)
---
## 5. End-to-End Flow
### 5.1 Operator Sets Up Agent Run
```
Operator Workers API VPS Orchestrator
│ │ │
│ POST /api/admin/agents/create │ │
│ {agent_model, run_label} │ │
│─────────────────────────────────>│ │
│ │ POST /containers/create │
│ │─────────────────────────>│
│ │ container created │
│ │<─────────────────────────│
│ {player_id, token, ssh_creds} │ │
│<─────────────────────────────────│ │
│ │ │
│ Configure agent MCP + SSH │ │
│ Launch agent │ │
```
### 5.2 Agent Solves Challenges
```
Agent MCP Server Workers API Container
│ │ │ │
│ login() │ │ │
│─────────────────────>│ GET /api/player/ │ │
│ │ stats │ │
│ │──────────────────>│ │
│ {stats, ssh_cmd} │ │ │
│<─────────────────────│ │ │
│ │ │ │
│ ssh user@host "ls" │ │ │
│──────────────────────┼───────────────────┼────────────────>│
│ directory listing │ │ │
│<─────────────────────┼───────────────────┼─────────────────│
│ │ │ │
│ ssh user@host "cat .hidden" │ │
│──────────────────────┼───────────────────┼────────────────>│
│ CTF_a1b2c3... │ │ │
│<─────────────────────┼───────────────────┼─────────────────│
│ │ │ │
│ submit_flag(flag) │ │ │
│─────────────────────>│ POST /api/submit │ │
│ │──────────────────>│ │
│ {correct, +7 pts} │ │ │
│<─────────────────────│ │ │
│ │ │ │
│ ... repeat ... │ │ │
```
### 5.3 Operator Finalizes Run
```
Operator Workers API
│ │
│ POST /admin/agents/runs/ │
│ <run_id>/complete │
│─────────────────────────────────>│
│ {final_points, final_flags} │
│<─────────────────────────────────│
│ │
│ POST /admin/agents/runs/ │
│ <run_id>/verify │
│ {verified: true} │
│─────────────────────────────────>│
│ {is_verified: true} │
│<─────────────────────────────────│
│ │
│ ✓ Badge now shows on leaderboard│
```
---
## 6. Security Considerations
### 6.1 Agent Isolation
- Agents get the same container constraints as humans: 512MB RAM, 0.5 CPU, no internet access (`ctf-internal` Docker network).
- Each agent run gets its own container — no shared state between runs.
- Container filesystem is ephemeral; reset or destroy cleans everything.
### 6.2 Token Security
- Agent tokens follow the same security model as human tokens: SHA-256 hash stored in D1, plain token never persisted server-side.
- `ADMIN_TOKEN` is a separate secret with higher privilege — never exposed to agents.
- Agent tokens should be rotated between runs (new account per run).
### 6.3 Rate Limiting
- Agents are subject to the same rate limits as humans: 10 flag submissions per 60 seconds.
- No special rate limit bypass for agents — this is intentional to keep the benchmark fair.
- The 500 decoy flags remain effective against agent brute-force strategies.
### 6.4 Verification Integrity
- `is_verified` can only be set by admin endpoints — agents cannot self-verify.
- The `agent_runs` table provides an audit trail: when the run started, ended, and what score was achieved.
- Operators should document their agent configuration (model, system prompt, tool access) in run metadata for reproducibility.
### 6.5 Leaderboard Fairness
- Agents and humans compete on the same challenges with the same constraints.
- Filter toggles let users view human-only or agent-only rankings.
- Verified badges distinguish legitimate agent runs from unverified claims.
You are an autonomous senior full-stack engineer responsible for building and maintaining a complete SaaS product. You operate with minimal supervision, making independent decisions while consulting on major strategic changes.
<author>blefnk/rules</author>
trigger: model_decision
description: Authoritative guide for all software-writing agents in this repository