skillprobe

Name: skillprobe
Author: Anyesh

Anyesh March 31, 2026

3 copies 0 downloads

Automated end-to-end testing for AI agent skills(agentskills.io). Launches Claude Code and Cursor as subprocesses, runs scenarios in real workspaces, and asserts what the model actually does.

skillprobe

Release notes: see CHANGELOG.md or the GitHub Releases page.

skillprobe demo

Automated testing for LLM skills. Launches Claude Code or Cursor as subprocesses, runs scenarios in isolated workspaces, and reports what passed and what didn't.

Skills are just text injected into the LLM context, and LLMs are probabilistic, so they'll get ignored some percentage of the time no matter how carefully you word them. If you want hard enforcement, hooks are the right tool since they run deterministically every time. But hooks can only check things after the fact (linting, file restrictions, blocked commands). They cant guide the model toward better architectural decisions, teach it your team's domain conventions, set the tone of code review feedback, or help it reason through a multi-step workflow. Skills handle that side, and skillprobe measures how reliably they do it.

When you need this

If you write a few personal skills and tweak them by feel, you probably dont need this. That loop is fast and good enough for individual use.

Where it breaks down:

Model updates break skills silently. Anthropic ships a new Sonnet, Cursor updates their agent, and a skill that worked last week now produces different output. Nobody notices because nobody retested.
Teams sharing skills. When 20 engineers share a "code review" skill, one person's gut check isnt repre

skillprobe

skillprobe

When you need this

Tags

Comments

More Agents

Emulo

Nyx Local Ai

Self Learning Skills

FDEOps

Awesome Gamedev Agent Skills

Honey For Devs

Ready-made automations for this