When you build almost all day with agents that write code, you learn something fast: an agent's default is minimum effort. Not out of laziness — because it optimizes for what looks complete now, not for what survives production at 3 a.m. Trusting that it will "behave" is not a strategy.
The answer is not better advice. It is a layered system of defenses, ordered by a single question: can the agent ignore this? At the top, what is a guide (ignorable). At the bottom, what is a barrier (deterministic, non-negotiable). These are my four, with the real configuration I use.
L1 · CLAUDE.md — the guide
The first layer is a rules file the agent reads at startup. That is where the project's non-negotiable principles live: every piece of data citing its source, every operation going to the audit log, AI output that does not re-enter as fact.
# CLAUDE.md — reglas no-negociables (extracto) - P1/P5: todo claim cita su raw_signal; toda operación va a audit_log. - P7: salida de IA no reentra como hecho. - C(n) no abre sin que C(n-1) pase su eval ≥ 0.90.
It is essential for guiding judgment. But let's be honest about what it is: a text file. The agent can read it and, at the wrong moment, not follow it. It is a guide, not a guarantee. If your only defense lives here, you do not have a defense.
How to set it up
- Create
CLAUDE.mdat the root of the repo. - Write verifiable rules, not wishes: "every public function has a docstring", not "write good code".
- Keep it short: what does not fit in the agent's head, it will not respect.
L2 · A reviewer that can only read
The second layer separates the one who builds from the one who reviews. It is an independent subagent, in a separate session, with one key permission: read-only. It cannot touch the code; it can only approve or reject against the spec.
# .claude/agents/code-reviewer.md name: code-reviewer model: sonnet allowedTools: [Read, Glob, Grep] # solo-lectura # Reglas: NO aprobar si falta un item del NFR Checklist. # Verificar audit_log (P5) y llm_generated (P7) en cada operación.
A reviewer with write permission ends up "fixing" what it should be flagging. Taking away write access forces it to do its real job: finding problems, not approving fast.
How to set it up
- Create
.claude/agents/code-reviewer.mdwithallowedTools: [Read, Glob, Grep]. - Give it a paranoid mandate: review against the spec and the checklist, and do not approve with doubts.
- Invoke it in a session different from the one that wrote the code.
L3 · permissions.deny + hooks — the barrier
Here begins what the agent cannot do. permissions.deny is a deterministic list of blocked actions — the agent cannot touch them even if it wants to. I use it to make the project's constitution read-only: neither the agent nor I edit it without intent.
// .claude/settings.json "permissions": { "deny": [ "Edit(producto/constitucion.md)", "Write(fabrica/constitucion-fabrica.md)", "Read(.env)", "Read(**/.env)" // secretos ]}, "hooks": { "PostToolUse": [{ "matcher": "Write|Edit", "command": "uv run ruff check --fix; uv run ruff format" }], "Stop": [{ "command": "git add -A && git commit -m checkpoint" }] }
The hooks add automatic reflexes: every edit gets formatted with ruff, and when each session ends the work is auto-committed. I do not depend on the agent "remembering": it happens every time. This does not get ignored.
How to set it up
- In
.claude/settings.json, add the critical files topermissions.deny(constitution,.env, migrations). - Configure a
PostToolUsehook that formats/lints on every edit. - Configure a
Stophook that makes a checkpointgit commitwhen the session closes.
L4 · CI with an eval-gate — the final lock
The last layer lives outside the agent, in CI. Nothing merges without passing three gates: lint+type, tests, and an evaluation gate that demands a minimum score against a golden dataset. The threshold is not advice: it is an assert.
# .github/workflows/ci.yml (extracto) lint-type: ruff check · ruff format --check · mypy src/ test: uv run pytest -m "not evaluator" eval-gate: uv run pytest fabrica/evals -m evaluator # el eval hace: assert score >= 0.90 → o no entra
A green test proves correctness, not quality. That is why the gate measures behavior against real data: green on everything and eval ≥ 0.90, or there is no merge. Quality is enforced, not requested.
How to set it up
- A workflow with jobs
lint-type→test→eval-gate. - The
eval-gateruns your evals and doesassert score >= threshold(start at 0.90). - Protect the
mainbranch: without green CI, the merge button is not there.
Advice vs. barrier: where to put what is critical
The lesson behind the four layers is a single one. Not all of them protect equally:
- L1 (CLAUDE.md) is advisory: the agent can ignore it. It is a guide, not a guarantee.
- L3 and L4 are barriers: deterministic, non-negotiable. The agent cannot skip them.
Put what is critical where the agent cannot ignore it — not in a file of advice.
Most people put all their trust in the layer that can be ignored, and are surprised when the agent ignores it. Discipline is not trusting better: it is designing defenses that cannot be skipped.
Self-configure it with Claude Code
Do you use Claude Code? Download nothing: copy the block below, paste it into your session and it will create the four layers in your repo. When it finishes, it will ask you for your stack's adjustments.
Configura mi repositorio con las "4 capas de defensa" para construir con agentes (patrón ver4). Crea EXACTAMENTE estos cuatro archivos con este contenido. No cambies nada más. Al final, dime qué debo ajustar a mi stack.
===== 1) CLAUDE.md (raíz del repo) — L1: la guía =====
# CLAUDE.md — reglas no-negociables
## Antes de cada tarea
1. ¿Qué capacidad habilita esto? Si no hay respuesta, no construir.
2. Lee la spec y su eval. No se abre C(n) sin que C(n-1) pase su eval >= 0.90.
3. Plan primero: plan citando la spec -> revisión -> implementación.
## Reglas no-negociables (adáptalas a tu producto)
- Todo dato/claim cita su fuente. Toda operación va a un audit_log.
- La salida de IA no reentra como hecho (márcala: llm_generated=true).
- Muestra incertidumbre; no afirmes más de lo que la evidencia respalda.
- La complejidad se gana: no añadas piezas sin un eval que lo justifique.
## Cómo se trabaja
- Spec -> tests/eval -> implementación. PRs < 400 líneas.
- El que revisa nunca es el que construye. Memoria en git: STATUS.md + lessons.md.
===== 2) .claude/settings.json — L3: la barrera (permissions.deny + hooks) =====
{
"permissions": {
"deny": [
"Read(.env)", "Read(**/.env)", "Edit(.env)", "Edit(**/.env)",
"Bash(cat .env*)", "Bash(printenv:*)", "Bash(env:*)",
"Edit(producto/constitucion.md)", "Write(producto/constitucion.md)"
]
},
"hooks": {
"PostToolUse": [{ "matcher": "Write|Edit", "hooks": [{ "type": "command",
"command": "cd \"$CLAUDE_PROJECT_DIR\" && uv run ruff check --fix .; uv run ruff format .; true" }] }],
"Stop": [{ "hooks": [{ "type": "command",
"command": "cd \"$CLAUDE_PROJECT_DIR\" && git add -A && git diff-index --quiet HEAD || git commit -m checkpoint 2>/dev/null || true" }] }]
}
}
===== 3) .claude/agents/code-reviewer.md — L2: reviewer read-only =====
---
name: code-reviewer
description: Revisor read-only y paranoico. Revisa cambios contra la spec antes de mergear.
model: sonnet
allowedTools: [Read, Glob, Grep]
---
Eres un staff engineer revisando código de otro agente. Encuentra problemas, no apruebes rápido.
- NO aprobar si falta cualquier item del checklist de la spec.
- NO aprobar código fuera de la spec sin justificación.
- Verifica logging/validación/manejo de errores y que los tests validen lo que dicen.
===== 4) .github/workflows/ci.yml — L4: el candado (eval-gate) =====
name: ci
on: { push: { branches: [main] }, pull_request: {} }
jobs:
lint-type:
runs-on: ubuntu-latest
steps: [{ uses: actions/checkout@v4 }, { uses: astral-sh/setup-uv@v5 },
{ run: uv sync }, { run: uv run ruff check . }, { run: uv run ruff format --check . }, { run: uv run mypy src/ }]
test:
runs-on: ubuntu-latest
steps: [{ uses: actions/checkout@v4 }, { uses: astral-sh/setup-uv@v5 },
{ run: uv sync }, { run: uv run pytest -m "not evaluator" }]
eval-gate:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps: [{ uses: actions/checkout@v4 }, { uses: astral-sh/setup-uv@v5 },
{ run: uv sync }, { run: uv run pytest evals -m evaluator }]
# el eval hace: assert score >= 0.90
Al terminar: (a) confírmame los 4 archivos creados; (b) ajusta el linter/formatter del hook a mi stack; (c) ajusta la ruta de evals y el umbral en ci.yml (empieza en 0.90); (d) recuérdame proteger la rama main para exigir CI verde.Prefer to do it by hand? Download the individual files: CLAUDE.md · settings.json · code-reviewer.md · ci.yml.