agenteval

Name: agenteval
Author: lukasmetzler

lukasmetzler April 1, 2026

4 copies 0 downloads

Lint, benchmark, and score your AI coding instructions. Stop guessing, start measuring.

agenteval

Your CLAUDE.md is untested. So is your AGENTS.md, your cursor-instructions.md, and your .cursorrules.

agenteval is a linter, benchmarker, and CI gate for AI coding instructions. It finds dead references, token bloat, contradictions, and stale instructions before your agent does. Then it scores agent performance so you can measure whether your instruction changes actually help.

agenteval demo

Install

npm install -g agenteval-cli

Or pick your preferred method:

brew tap lukasmetzler/agenteval && brew install agenteval   # Homebrew
curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash  # Shell

No Bun, no Node at runtime. The binary is self-contained.

Quick Start

agenteval lint                    # Find problems in your instruction files
agenteval lint --explain          # Same, with explanations for each rule
agenteval harvest --dry-run       # Preview what AI commits are in your history
agenteval ci                      # Run all tasks, fail on regressions

What It Catches

Dead references to files, paths, and headings that don't exist
Filler phrases that waste context tokens ("make sure to", "it is important that")
Contradictions between instruction files ("always use X" and "never use X")
Content overlap and duplication across files
Token budget overruns

agenteval

agenteval

Install

Quick Start

What It Catches

Tags

Comments

More Agents

Emulo

Nyx Local Ai

Self Learning Skills

FDEOps

Awesome Gamedev Agent Skills

Honey For Devs

Ready-made automations for this