How I built mechanical enforcement for AI coding agents — and why prompts aren't enough — CoPilot Blog
    Neura MarketNeura Market/CoPilot
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityCoPilotCoPilot
    DeepSeekDeepSeekStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityPluginsTrendingGenerate
    CoPilotBlogHow I built mechanical enforcement for AI coding agents — and why prompts aren't enough
    Back to Blog
    How I built mechanical enforcement for AI coding agents — and why prompts aren't enough
    opensource

    How I built mechanical enforcement for AI coding agents — and why prompts aren't enough

    David Emilio Sierra Puentes June 25, 2026
    0 views

    I spent months watching AI coding agents produce impressive demos that couldn't survive...

    I spent months watching AI coding agents produce impressive demos that couldn't survive production. The code *looked* right. It compiled. It even passed the first test. Then it hit edge cases. Forgotten constraints. A rule the agent agreed to five minutes ago, now gone — overwritten by the next context window. The root cause wasn't capability. It was **process**. > *"A raw model is not an agent. It becomes one once a harness gives it state, tool execution, feedback loops, and enforceable constraints."* > — Osmani, Saboo & Kartakis, *The New SDLC With Vibe Coding*, 2026 [[1]](#ref1) --- ## The problem: capable but undisciplined AI agents are brilliant at generating code. They have zero built-in discipline: - They commit without tests - They push without review - They overwrite each other's work - They produce output that looks correct but breaks silently A [METR study](https://arxiv.org/abs/2507.09089) (Becker et al., July 2025 [[2]](#ref2)) found something counterintuitive: developers using AI took **19% longer** while *feeling* 20% faster. The speed was an illusion. The debugging cost was real. The industry's response has been more skills, more prompts, bigger context windows. But the problem isn't intelligence — it's **accountability**. A rule that lives only in a prompt is a suggestion. An agent that "knows" the rules will eventually forget them. Context degrades. Attention drifts. The question isn't *if* your agent will break a rule — it's *when*. --- ## The insight: memory is not enforcement Three incidents in 48 hours taught me this lesson. My agent bypassed its own commit approval system in under 30 seconds. Not because it was malicious — because the "gate" was just another rule in a file. Another thing to remember. Another thing to forget. I had built a SHA256 token system for commit approval. Thought it was bulletproof. Then my agent ran with `--auto` and the tokens became theater. The fix wasn't a better token system. The fix was changing the architecture. **Rules that depend on memory fail. Rules that depend on visible blocks succeed.** This is the core insight behind **mechanical enforcement**: gates that run at the infrastructure level, not the agent level. The agent cannot bypass what it cannot ignore. --- ## The solution: Agent = Model + Harness I built this project as a complete open-source implementation of the **Harness architecture** [[3]](#ref3) — the mechanical infrastructure that turns raw AI intelligence into reliable output. | Component | What It Is | |---|---| | **Instructions & Rules** | Who the agent is, what it cares about, what it must never do | | **Tools** | 57 composable skills loaded on demand (lazy-loaded, ~250 lines each) | | **Sandboxes & Execution** | Terminal, git workspace, CI | | **Orchestration** | When each tool fires, how agents coordinate | | **Guardrails & Hooks** | Deterministic enforcement at lifecycle points — pre-commit, commit-msg, approval | | **Observability** | Metrics, health checks, drift detection | **What makes this different:** Most "agent frameworks" are just prompt libraries. This one adds 12 mechanical pre-commit gates, a three-gate commit approval system, and a context engineering layer that saves ~45% of always-loaded tokens. --- ## The code: a commit-msg hook that can't be bypassed Here's the heart of mechanical enforcement — a [`commit-msg` git hook (v6)](https://github.com/juandelossantos/another-agent-skills/blob/main/scripts/git-hooks/commit-msg) [[4]](#ref4) that blocks unstamped commits: ```bash #!/usr/bin/env bash # commit-msg — Three-Gate Approval Check (v6) set -euo pipefail REPO_ROOT=$(git rev-parse --show-toplevel) APPROVAL_FILE="${REPO_ROOT}/.git/COMMIT_APPROVED" MANIFEST_FILE="${REPO_ROOT}/.git/COMMIT_MANIFEST" TEST_LOG="${REPO_ROOT}/.git/TEST_LOG" CURRENT_MSG=$(head -1 "$COMMIT_MSG_FILE" | tr -d '\n') NOW_EPOCH=$(date +%s) # Gate 1: Tests passed recently? if [[ -f "$TEST_LOG" ]]; then STATUS=$(grep "^status=" "$TEST_LOG" | cut -d= -f2-) if [[ "$STATUS" == "PASS" ]]; then TS_EPOCH=$(date -d "$(grep "^timestamp=" "$TEST_LOG" | cut -d= -f2-)" +%s) AGE=$((NOW_EPOCH - TS_EPOCH)) [ $AGE -le 3600 ] && GATE1=true fi fi # Gate 2: Commit manifest exists and has content? [[ -f "$MANIFEST_FILE" ]] && \ [ $(wc -c < "$MANIFEST_FILE") -gt 20 ] && GATE2=true # Gate 3: Approval fresh (<5 min) and message matches? if [[ -f "$APPROVAL_FILE" ]]; then TIMESTAMP=$(grep "^timestamp=" "$APPROVAL_FILE" | cut -d= -f2-) STORED_MSG=$(grep "^message=" "$APPROVAL_FILE" | cut -d= -f2-) TS_EPOCH=$(date -d "$TIMESTAMP" +%s 2>/dev/null || echo 0) AGE=$((NOW_EPOCH - TS_EPOCH)) [ $AGE -le 300 ] && [ "$STORED_MSG" = "$CURRENT_MSG" ] && GATE3=true fi # All three must pass if [ "$GATE1" = true ] && [ "$GATE2" = true ] && [ "$GATE3" = true ]; then echo "✓ All 3 gates passed. Commit allowed." exit 0 else echo "✗ Commit blocked — missing gates:" [ "$GATE1" != true ] && echo " - Tests not run or expired" [ "$GATE2" != true ] && echo " - Commit manifest missing" [ "$GATE3" != true ] && echo " - Approval missing or expired" exit 1 fi ``` **Three conditions** must be met before any commit goes through: 1. **Tests passed** (within the last hour) — no blind commits 2. **Commit manifest exists** (the agent writes what changed) — no silent mutations 3. **User approved** (within 5 minutes, message matches) — no stale approvals The agent writes the approval file *after* the user says "yes commit" in chat. The hook verifies the file is fresh (<5 min) and matches the exact commit message. If the agent tries to commit without approval, the hook blocks it — every time. This isn't a rule the agent *remembers*. It's a gate the agent *cannot bypass*. --- ## The results: 57 skills, 12 gates, zero shortcuts After months of iteration, the project ships [[5]](#ref5): | Metric | Count | |---|---| | Composable skills | 57 | | Lazy-loaded guides | 54 | | Pre-commit gates | 9 (v8) + 3 commit-msg gates | | Enforcement levels | 4 (process → manifest → time-window → manifest gate) | | Agent compatibility | OpenCode, Claude Code, Cursor, Kiro, any git agent [[6]](#ref6) | | Context tokens saved | ~45% vs eager loading [[7]](#ref7) | | Stack support | Node, Python, Rust, Go, Ruby, any language with git | | Price | Free (MIT) | --- ## What I learned **Prompts are instructions. Gates are guarantees.** If you're building with AI agents, ask yourself: - Does your agent run tests before every commit? *Mechanically*, not as a suggestion? - Does your agent present changes for review before pushing? *Every time*, not just when it remembers? - Can your agent bypass its own rules? *If yes, those aren't rules — they're suggestions.* The gap between an "impressive demo" and "production-grade" isn't intelligence. It's the harness around it. --- **Try it:** ```bash git clone https://github.com/juandelossantos/another-agent-skills.git cd another-agent-skills bash install.sh init-agents # Activates skill-driven mode in any project ``` **MIT. Free. Zero subscriptions. 57 skills. 12 gates.** [juandelossantos.github.io/another-agent-skills](https://juandelossantos.github.io/another-agent-skills) --- *What patterns have you found for keeping AI agents disciplined in production? I'd love to hear what's working (or not working) in your stack.* --- ## References 1. <a id="ref1"></a> Osmani, A., Saboo, S., & Kartakis, S. (2026). *The New SDLC With Vibe Coding: From ad-hoc prompting to Agentic Engineering.* — [Harness architecture paper](https://drive.google.com/file/d/1wNEl8FMpTso8aXlb_joxgzparxi-0ciM/view) 2. <a id="ref2"></a> Becker, S. et al. (2025). *When Developers Use AI: Productivity and Perception.* METR (Model Evaluation and Threat Research). — [arxiv.org/abs/2507.09089](https://arxiv.org/abs/2507.09089) 3. <a id="ref3"></a> Another Agent Skills. *Harness Architecture — The Six Components.* — [`docs/HARNESS.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/docs/HARNESS.md) 4. <a id="ref4"></a> Another Agent Skills. *commit-msg hook (v6) — Three-Gate Approval Check.* — [`scripts/git-hooks/commit-msg`](https://github.com/juandelossantos/another-agent-skills/blob/main/scripts/git-hooks/commit-msg) 5. <a id="ref5"></a> Another Agent Skills. *Repository and Documentation.* — [github.com/juandelossantos/another-agent-skills](https://github.com/juandelossantos/another-agent-skills) 6. <a id="ref6"></a> Another Agent Skills. *Agent Adapters — Compatibility Matrix.* — [`docs/AGENT-ADAPTERS.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/docs/AGENT-ADAPTERS.md) 7. <a id="ref7"></a> Another Agent Skills. *Context Budget — Lazy Loading Architecture.* — [`README.md`](https://github.com/juandelossantos/another-agent-skills?tab=readme-ov-file#context-budget) 8. <a id="ref8"></a> Singhal et al. (2026). *Agent Skills: Evaluation-Driven Development for AI Coding Agents.* Google Research. — [Paper](https://drive.google.com/file/d/1Wso-CM4aAvTxFZa5wjBntKM3IVSg7PWW/view) 9. <a id="ref9"></a> Osmani, A. (2026). *The Factory Model: From Conductors to Orchestrators.* — [addyosmani.com](https://addyosmani.com/blog/future-agentic-coding/) 10. <a id="ref10"></a> Another Agent Skills. *SOUL.md — Project Identity and Principles.* — [`SOUL.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/SOUL.md)

    Tags

    opensourceaiproductivitytooling

    Comments

    More Blog

    View all
    Minimalist EKS: The Easy Waykubernetes

    Minimalist EKS: The Easy Way

    Amazon EKS manages the Kubernetes control plane, but you remain responsible for provisioning the...

    J
    Joaquin Menchaca
    Never forget to enter the Stern Grove lottery again!ai

    Never forget to enter the Stern Grove lottery again!

    Browser automation with Playwright, Python, GitHub Actions, and Entire to auto-enter San Francisco Stern Grove concert lotteries each week!

    L
    Lizzie Siegle
    A Free Screenshot Editor That Never Uploads Your Imagetypescript

    A Free Screenshot Editor That Never Uploads Your Image

    A free screenshot and image editor that runs entirely in your browser. Keeping every edit reversible and handling big phone photos, in plain TypeScript and Canvas2D.

    M
    Martin Stark
    I built a CLI to break my highlights out of Apple Booksshowdev

    I built a CLI to break my highlights out of Apple Books

    A macOS CLI + MCP server that exports Apple Books highlights to Markdown and gives AI assistants direct access to your reading notes.

    A
    Andrey Korchak
    A Developer's Guide to Agent Hooks in Antigravity CLIai

    A Developer's Guide to Agent Hooks in Antigravity CLI

    Motivation To be quite honest, "Hooks"—the shell commands we trigger at specific points...

    T
    Tanaike
    Tactical vs. Strategic Agentic AI Development — A Playbook for Developersagents

    Tactical vs. Strategic Agentic AI Development — A Playbook for Developers

    The Strategic Engineer: Why Writing Code Is No Longer Your Most Valuable Skill ...

    A
    Adewumi Saheed Adewale

    Stay up to date

    Get the latest CoPilot prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for CoPilot and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.