Lint, benchmark, and score your AI coding instructions. Stop guessing, start measuring.
# agenteval
Your CLAUDE.md is untested. So is your AGENTS.md, your cursor-instructions.md, and your .cursorrules.
agenteval is a linter, benchmarker, and CI gate for AI coding instructions. It finds dead references, token bloat, contradictions, and stale instructions before your agent does. Then it scores agent performance so you can measure whether your instruction changes actually help.
[](https://github.com/lukasmetzler/agenteval/actions/workflows/ci.yml)
[](https://www.npmjs.com/package/agenteval-cli)
[](https://www.npmjs.com/package/agenteval-cli)
[](https://github.com/lukasmetzler/agenteval/releases)
[](LICENSE)

## Install
```bash
npm install -g agenteval-cli
```
Or pick your preferred method:
```bash
brew tap lukasmetzler/agenteval && brew install agenteval # Homebrew
curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash # Shell
```
No Bun, no Node at runtime. The binary is self-contained.
## Quick Start
```bash
agenteval lint # Find problems in your instruction files
agenteval lint --explain # Same, with explanations for each rule
agenteval harvest --dry-run # Preview what AI commits are in your history
agenteval ci # Run all tasks, fail on regressions
```
## What It Catches
- Dead references to files, paths, and headings that don't exist
- Filler phrases that waste context tokens ("make sure to", "it is important that")
- Contradictions between instruction files ("always use X" and "never use X")
- Content overlap and duplication across files
- Token budget overruns Agent that generates comprehensive documentation, API references, architecture diagrams, and developer onboarding guides from existing code.
Agent configuration for systematic bug investigation that traces issues from error logs through the codebase to root cause with suggested fixes.
Agent for integrating third-party APIs including SDK setup, type generation, error handling, retry logic, and rate limit management.
Cursor's built-in autonomous coding agent that can make multi-file edits, run terminal commands, search the codebase, and iteratively build features with minimal human intervention.
Cloud-based autonomous coding agent that runs in the background on remote sandboxed environments, handling complex multi-step tasks while you continue working.
Cursor's multi-file editing agent within Composer mode that can create, edit, and delete files across your entire project in a single conversation.