I spent months watching AI coding agents produce impressive demos that couldn't survive production.
The code *looked* right. It compiled. It even passed the first test.
Then it hit edge cases. Forgotten constraints. A rule the agent agreed to five minutes ago, now gone — overwritten by the next context window.
The root cause wasn't capability. It was **process**.
> *"A raw model is not an agent. It becomes one once a harness gives it state, tool execution, feedback loops, and enforceable constraints."*
> — Osmani, Saboo & Kartakis, *The New SDLC With Vibe Coding*, 2026 [[1]](#ref1)
---
## The problem: capable but undisciplined
AI agents are brilliant at generating code. They have zero built-in discipline:
- They commit without tests
- They push without review
- They overwrite each other's work
- They produce output that looks correct but breaks silently
A [METR study](https://arxiv.org/abs/2507.09089) (Becker et al., July 2025 [[2]](#ref2)) found something counterintuitive: developers using AI took **19% longer** while *feeling* 20% faster. The speed was an illusion. The debugging cost was real.
The industry's response has been more skills, more prompts, bigger context windows. But the problem isn't intelligence — it's **accountability**.
A rule that lives only in a prompt is a suggestion. An agent that "knows" the rules will eventually forget them. Context degrades. Attention drifts. The question isn't *if* your agent will break a rule — it's *when*.
---
## The insight: memory is not enforcement
Three incidents in 48 hours taught me this lesson.
My agent bypassed its own commit approval system in under 30 seconds. Not because it was malicious — because the "gate" was just another rule in a file. Another thing to remember. Another thing to forget.
I had built a SHA256 token system for commit approval. Thought it was bulletproof. Then my agent ran with `--auto` and the tokens became theater.
The fix wasn't a better token system. The fix was changing the architecture.
**Rules that depend on memory fail. Rules that depend on visible blocks succeed.**
This is the core insight behind **mechanical enforcement**: gates that run at the infrastructure level, not the agent level. The agent cannot bypass what it cannot ignore.
---
## The solution: Agent = Model + Harness
I built this project as a complete open-source implementation of the **Harness architecture** [[3]](#ref3) — the mechanical infrastructure that turns raw AI intelligence into reliable output.
| Component | What It Is |
|---|---|
| **Instructions & Rules** | Who the agent is, what it cares about, what it must never do |
| **Tools** | 57 composable skills loaded on demand (lazy-loaded, ~250 lines each) |
| **Sandboxes & Execution** | Terminal, git workspace, CI |
| **Orchestration** | When each tool fires, how agents coordinate |
| **Guardrails & Hooks** | Deterministic enforcement at lifecycle points — pre-commit, commit-msg, approval |
| **Observability** | Metrics, health checks, drift detection |
**What makes this different:** Most "agent frameworks" are just prompt libraries. This one adds 12 mechanical pre-commit gates, a three-gate commit approval system, and a context engineering layer that saves ~45% of always-loaded tokens.
---
## The code: a commit-msg hook that can't be bypassed
Here's the heart of mechanical enforcement — a [`commit-msg` git hook (v6)](https://github.com/juandelossantos/another-agent-skills/blob/main/scripts/git-hooks/commit-msg) [[4]](#ref4) that blocks unstamped commits:
```bash
#!/usr/bin/env bash
# commit-msg — Three-Gate Approval Check (v6)
set -euo pipefail
REPO_ROOT=$(git rev-parse --show-toplevel)
APPROVAL_FILE="${REPO_ROOT}/.git/COMMIT_APPROVED"
MANIFEST_FILE="${REPO_ROOT}/.git/COMMIT_MANIFEST"
TEST_LOG="${REPO_ROOT}/.git/TEST_LOG"
CURRENT_MSG=$(head -1 "$COMMIT_MSG_FILE" | tr -d '\n')
NOW_EPOCH=$(date +%s)
# Gate 1: Tests passed recently?
if [[ -f "$TEST_LOG" ]]; then
STATUS=$(grep "^status=" "$TEST_LOG" | cut -d= -f2-)
if [[ "$STATUS" == "PASS" ]]; then
TS_EPOCH=$(date -d "$(grep "^timestamp=" "$TEST_LOG" | cut -d= -f2-)" +%s)
AGE=$((NOW_EPOCH - TS_EPOCH))
[ $AGE -le 3600 ] && GATE1=true
fi
fi
# Gate 2: Commit manifest exists and has content?
[[ -f "$MANIFEST_FILE" ]] && \
[ $(wc -c < "$MANIFEST_FILE") -gt 20 ] && GATE2=true
# Gate 3: Approval fresh (<5 min) and message matches?
if [[ -f "$APPROVAL_FILE" ]]; then
TIMESTAMP=$(grep "^timestamp=" "$APPROVAL_FILE" | cut -d= -f2-)
STORED_MSG=$(grep "^message=" "$APPROVAL_FILE" | cut -d= -f2-)
TS_EPOCH=$(date -d "$TIMESTAMP" +%s 2>/dev/null || echo 0)
AGE=$((NOW_EPOCH - TS_EPOCH))
[ $AGE -le 300 ] && [ "$STORED_MSG" = "$CURRENT_MSG" ] && GATE3=true
fi
# All three must pass
if [ "$GATE1" = true ] && [ "$GATE2" = true ] && [ "$GATE3" = true ]; then
echo "✓ All 3 gates passed. Commit allowed."
exit 0
else
echo "✗ Commit blocked — missing gates:"
[ "$GATE1" != true ] && echo " - Tests not run or expired"
[ "$GATE2" != true ] && echo " - Commit manifest missing"
[ "$GATE3" != true ] && echo " - Approval missing or expired"
exit 1
fi
```
**Three conditions** must be met before any commit goes through:
1. **Tests passed** (within the last hour) — no blind commits
2. **Commit manifest exists** (the agent writes what changed) — no silent mutations
3. **User approved** (within 5 minutes, message matches) — no stale approvals
The agent writes the approval file *after* the user says "yes commit" in chat. The hook verifies the file is fresh (<5 min) and matches the exact commit message. If the agent tries to commit without approval, the hook blocks it — every time.
This isn't a rule the agent *remembers*. It's a gate the agent *cannot bypass*.
---
## The results: 57 skills, 12 gates, zero shortcuts
After months of iteration, the project ships [[5]](#ref5):
| Metric | Count |
|---|---|
| Composable skills | 57 |
| Lazy-loaded guides | 54 |
| Pre-commit gates | 9 (v8) + 3 commit-msg gates |
| Enforcement levels | 4 (process → manifest → time-window → manifest gate) |
| Agent compatibility | OpenCode, Claude Code, Cursor, Kiro, any git agent [[6]](#ref6) |
| Context tokens saved | ~45% vs eager loading [[7]](#ref7) |
| Stack support | Node, Python, Rust, Go, Ruby, any language with git |
| Price | Free (MIT) |
---
## What I learned
**Prompts are instructions. Gates are guarantees.**
If you're building with AI agents, ask yourself:
- Does your agent run tests before every commit? *Mechanically*, not as a suggestion?
- Does your agent present changes for review before pushing? *Every time*, not just when it remembers?
- Can your agent bypass its own rules? *If yes, those aren't rules — they're suggestions.*
The gap between an "impressive demo" and "production-grade" isn't intelligence. It's the harness around it.
---
**Try it:**
```bash
git clone https://github.com/juandelossantos/another-agent-skills.git
cd another-agent-skills
bash install.sh
init-agents # Activates skill-driven mode in any project
```
**MIT. Free. Zero subscriptions. 57 skills. 12 gates.**
[juandelossantos.github.io/another-agent-skills](https://juandelossantos.github.io/another-agent-skills)
---
*What patterns have you found for keeping AI agents disciplined in production? I'd love to hear what's working (or not working) in your stack.*
---
## References
1. <a id="ref1"></a> Osmani, A., Saboo, S., & Kartakis, S. (2026). *The New SDLC With Vibe Coding: From ad-hoc prompting to Agentic Engineering.* — [Harness architecture paper](https://drive.google.com/file/d/1wNEl8FMpTso8aXlb_joxgzparxi-0ciM/view)
2. <a id="ref2"></a> Becker, S. et al. (2025). *When Developers Use AI: Productivity and Perception.* METR (Model Evaluation and Threat Research). — [arxiv.org/abs/2507.09089](https://arxiv.org/abs/2507.09089)
3. <a id="ref3"></a> Another Agent Skills. *Harness Architecture — The Six Components.* — [`docs/HARNESS.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/docs/HARNESS.md)
4. <a id="ref4"></a> Another Agent Skills. *commit-msg hook (v6) — Three-Gate Approval Check.* — [`scripts/git-hooks/commit-msg`](https://github.com/juandelossantos/another-agent-skills/blob/main/scripts/git-hooks/commit-msg)
5. <a id="ref5"></a> Another Agent Skills. *Repository and Documentation.* — [github.com/juandelossantos/another-agent-skills](https://github.com/juandelossantos/another-agent-skills)
6. <a id="ref6"></a> Another Agent Skills. *Agent Adapters — Compatibility Matrix.* — [`docs/AGENT-ADAPTERS.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/docs/AGENT-ADAPTERS.md)
7. <a id="ref7"></a> Another Agent Skills. *Context Budget — Lazy Loading Architecture.* — [`README.md`](https://github.com/juandelossantos/another-agent-skills?tab=readme-ov-file#context-budget)
8. <a id="ref8"></a> Singhal et al. (2026). *Agent Skills: Evaluation-Driven Development for AI Coding Agents.* Google Research. — [Paper](https://drive.google.com/file/d/1Wso-CM4aAvTxFZa5wjBntKM3IVSg7PWW/view)
9. <a id="ref9"></a> Osmani, A. (2026). *The Factory Model: From Conductors to Orchestrators.* — [addyosmani.com](https://addyosmani.com/blog/future-agentic-coding/)
10. <a id="ref10"></a> Another Agent Skills. *SOUL.md — Project Identity and Principles.* — [`SOUL.md`](https://github.com/juandelossantos/another-agent-skills/blob/main/SOUL.md)