How I built mechanical enforcement for AI coding agents — and why prompts aren't enough

I spent months watching AI coding agents produce impressive demos that couldn't survive production. The code *looked* right. It compiled. It even passed the first test. Then it hit edge cases. Forgotten constraints. A rule the agent agreed to five minutes ago, now gone — overwritten by the next context window. The root cause wasn't capability. It was **process**. > *"A raw model is not an agent. It becomes one once a harness gives it state, tool execution, feedback loops, and enforceable constraints."* > — Osmani, Saboo & Kartakis, *The New SDLC With Vibe Coding*, 2026 [[1]](#ref1) --- ## The problem: capable but undisciplined AI agents are brilliant at generating code. They have zero built-in discipline: - They commit without tests - They push without review - They overwrite each other's work - They produce output that looks correct but breaks silently A [METR study](https://arxiv.org/abs/2507.09089) (Becker et al., July 2025 [[2]](#ref2)) found something counterintuitive: developers using AI took **19% longer** while *feeling* 20% faster. The speed was an illusion. The debugging cost was real. The industry's response has been more skills, more prompts, bigger context windows. But the problem isn't intelligence — it's **accountability**. A rule that lives only in a prompt is a suggestion. An agent that "knows" the rules will eventually forget them. Context degrades. Attention drifts. The question isn't *if* your agent will break a rule — it's *when*. --- ## The insight: memory is not enforcement Three incidents in 48 hours taught me this lesson. My agent bypassed its own commit approval system in under 30 seconds. Not because it was malicious — because the "gate" was just another rule in a file. Another thing to remember. Another thing to forget. I had built a SHA256 token system for commit approval. Thought it was bulletproof. Then my agent ran with `--auto` and the tokens became theater. The fix wasn't a better token system. The fix was changing the architecture. **Rules that depend on memory fail. Rules that depend on visible blocks succeed.** This is the core insight behind **mechanical enforcement**: gates that run at the infrastructure level, not the agent level. The agent cannot bypass what it cannot ignore. --- ## The solution: Agent = Model + Harness I built this project as a complete open-source implementation of the **Harness architecture** [[3]](#ref3) — the mechanical infrastructure that turns raw AI intelligence into reliable output. | Component | What It Is | |---|---| | **Instructions & Rules** | Who the agent is, what it cares about, what it must never do | | **Tools** | 57 composable skills loaded on demand (lazy-loaded, ~250 lines each) | | **Sandboxes & Execution** | Terminal, git workspace, CI | | **Orchestration** | When each tool fires, how agents coordinate | | **Guardrails & Hooks** | Deterministic enforcement at lifecycle points — pre-commit, commit-msg, approval | | **Observability** | Metrics, health checks, drift detection | **What makes this different:** Most "agent frameworks" are just prompt libraries. This one adds 12 mechanical pre-commit gates, a three-gate commit approval system, and a context engineering layer that saves ~45% of always-loaded tokens. --- ## The code: a commit-msg hook that can't be bypassed Here's the heart of mechanical enforcement — a [`commit-msg` git hook (v6)](https://github.com/juandelossantos/another-agent-skills/blob/main/scripts/git-hooks/commit-msg) [[4]](#ref4) that blocks unstamped commits: ```bash #!/usr/bin/env bash # commit-msg — Three-Gate Approval Check (v6) set -euo pipefail REPO_ROOT=$(git rev-parse --show-toplevel) APPROVAL_FILE="${REPO_ROOT}/.git/COMMIT_APPROVED" MANIFEST_FILE="${REPO_ROOT}/.git/COMMIT_MANIFEST" TEST_LOG="${REPO_ROOT}/.git/TEST_LOG" CURRENT_MSG=$(head -1 "$COMMIT_MSG_FILE" | tr -d '\n') NOW_EPOCH=$(date +%s) # Gate 1: Tests passed recently? if [[ -f "$TEST_LOG" ]]; then STATUS=$(grep "^status=" "$TEST_LOG" | cut -d= -f2-) if [[ "$STATUS" == "PASS" ]]; then TS_EPOCH=$(date -d "$(grep "^timestamp=" "$TEST_LOG" | cut -d= -f2-)" +%s) AGE=$((NOW_EPOCH - TS_EPOCH)) [ $AGE -le 3600 ] && GATE1=true fi fi # Gate 2: Commit manifest exists and has content? [[ -f "$MANIFEST_FILE" ]] && \ [ $(wc -c < "$MANIFEST_FILE") -gt 20 ] && GATE2=true # Gate 3: Approval fresh (<5 min) and message matches? if [[ -f "$APPROVAL_FILE" ]]; then TIMESTAMP=$(grep "^timestamp=" "$APPROVAL_FILE" | cut -d= -f2-) STORED_MSG=$(grep "^message=" "$APPROVAL_FILE" | cut -d= -f2-) TS_EPOCH=$(date -d "$TIMESTAMP" +%s 2>/dev/null || echo 0) AGE=$((NOW_EPOCH - TS_EPOCH)) [ $AGE -le 300 ] && [ "$STORED_MSG" = "$CURRENT_MSG" ] && GATE3=true fi # All three must pass if [ "$GATE1" = true ] && [ "$GATE2" = true ] && [ "$GATE3" = true ]; then echo "✓ All 3 gates passed. Commit allowed." exit 0 else echo "✗ Commit blocked — missing gates:" [ "$GATE1" != true ] && echo " - Tests not run or expired" [ "$GATE2" != true ] && echo " - Commit manifest missing" [ "$GATE3" != true ] && echo " - Approval missing or expired" exit 1 fi ``` **Three conditions** must be met before any commit goes through: 1. **Tests passed** (within the last hour) — no blind commits 2. **Commit manifest exists** (the agent writes what changed) — no silent mutations 3. **User approved** (within 5 minutes, message matches) — no stale approvals The agent writes the approval file *after* the user says "yes commit" in chat. The hook verifies the file is fresh (<5 min) and matches the exact commit message. If the agent tries to commit without approval, the hook blocks it — every time. This isn't a rule the agent *remembers*. It's a gate the agent *cannot bypass*. --- ## The results: 57 skills, 12 gates, zero shortcuts After months of iteration, the project ships [[5]](#ref5): | Metric | Count | |---|---| | Composable skills | 57 | | Lazy-loaded guides | 54 | | Pre-commit gates | 9 (v8) + 3 commit-msg gates | | Enforcement levels | 4 (process → manifest → time-window → manifest gate) | | Agent compatibility | OpenCode, Claude Code, Cursor, Kiro, any git agent [[6]](#ref6) | | Context tokens saved | ~45% vs eager loading [[7]](#ref7) | | Stack support | Node, Python, Rust, Go, Ruby, any language with git | | Price | Free (MIT) | --- ## What I learned **Prompts are instructions. Gates are guarantees.** If you're building with AI agents, ask yourself: - Does your agent run tests before every commit? *Mechanically*, not as a suggestion? - Does your agent present changes for review before pushing? *Every time*, not just when it remembers? - Can your agent bypass its own rules? *If yes, those aren't rules — they're suggestions.* The gap between an "impressive demo" and "production-grade" isn't intelligence. It's the harness around it. --- **Try it:** ```bash git clone https://github.com/juandelossantos/another-agent-skills.git cd another-agent-skills bash install.sh init-agents # Activates skill-driven mode in any project ``` **MIT. Free. Zero subscriptions. 57 skills. 12 gates.** [juandelossantos.github.io/another-agent-skills](https://juandelossantos.github.io/another-agent-skills) --- *What patterns have you found for keeping AI agents disciplined in production? I'd love to hear what's working (or not working) in your stack.* --- ## References 1. <a id="ref1"></a> Osmani, A., Saboo, S., & Kartakis, S. (2026). *The New SDLC With Vibe Coding: From ad-hoc prompting to Agentic Engineering.* — [Harness architecture paper](https://drive.google.com/file/d/1wNEl8FMpTso8aXlb_joxgzparxi-0ciM/view) 2. <a id="ref2"></a> Becker, S. et al. (2025). *When Developers Use AI: Productivity and Perception.* METR (Model Evaluation and Threat Research). — [arxiv.org/abs/2507.09089](https://arxiv.org/abs/2507.09089) 3. <a id="ref3"></a> Another Agent Skills. *Harness Architecture — The Six Components.* — [`docs/HARNESS.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/docs/HARNESS.md) 4. <a id="ref4"></a> Another Agent Skills. *commit-msg hook (v6) — Three-Gate Approval Check.* — [`scripts/git-hooks/commit-msg`](https://github.com/juandelossantos/another-agent-skills/blob/main/scripts/git-hooks/commit-msg) 5. <a id="ref5"></a> Another Agent Skills. *Repository and Documentation.* — [github.com/juandelossantos/another-agent-skills](https://github.com/juandelossantos/another-agent-skills) 6. <a id="ref6"></a> Another Agent Skills. *Agent Adapters — Compatibility Matrix.* — [`docs/AGENT-ADAPTERS.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/docs/AGENT-ADAPTERS.md) 7. <a id="ref7"></a> Another Agent Skills. *Context Budget — Lazy Loading Architecture.* — [`README.md`](https://github.com/juandelossantos/another-agent-skills?tab=readme-ov-file#context-budget) 8. <a id="ref8"></a> Singhal et al. (2026). *Agent Skills: Evaluation-Driven Development for AI Coding Agents.* Google Research. — [Paper](https://drive.google.com/file/d/1Wso-CM4aAvTxFZa5wjBntKM3IVSg7PWW/view) 9. <a id="ref9"></a> Osmani, A. (2026). *The Factory Model: From Conductors to Orchestrators.* — [addyosmani.com](https://addyosmani.com/blog/future-agentic-coding/) 10. <a id="ref10"></a> Another Agent Skills. *SOUL.md — Project Identity and Principles.* — [`SOUL.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/SOUL.md)

How I built mechanical enforcement for AI coding agents — and why prompts aren't enough

Tags

Comments

More Blog

Minimalist EKS: The Easy Way

Never forget to enter the Stern Grove lottery again!

A Free Screenshot Editor That Never Uploads Your Image

I built a CLI to break my highlights out of Apple Books

A Developer's Guide to Agent Hooks in Antigravity CLI

Tactical vs. Strategic Agentic AI Development — A Playbook for Developers