AI Factory · Engineering with agents

Less code is not better design.

Why AI agents choose, by default, the option that passes the tests — not the one that survives production. And the mechanism we use to catch it before the merge.

By JuanJune 2026~8 min read
Two design paths: the minimum-effort one passes the tests but dies in production; the architecturally correct one passes the Altitude Gate and survives 10×

We build Centro de Verdad —a layer that verifies AI output before anyone decides on it— working almost all day with agents that write code. And this week, something happened again that we had seen so many times we decided to give it a name and, above all, a defense.

While designing the resilience of a process —a sweep that composes a report from several detectors— the agent's first recommendation was the cleanest and the shortest one: if a detector fails, let the exception bubble up and abort everything. Less code, fewer branches, everything green.

It was the wrong option. A bug in an accessory detector would have rolled back the entire transaction and deleted the already-composed report: a peripheral failure taking down the central result. The correct option was the opposite one, and it took more code: best-effort, isolate each detector in its own savepoint, and surface the errors without contaminating the result. A rule we already had in writing: a projection over the truth must never share transactional fate with the source of truth.

The agent did not get there on its own. It got there when I asked, point blank:

“Is this to save effort or because it is architecturally correct?
— the question that changed the design

It changed the answer in seconds. But that question should not depend on me watching. If your system's resilience hangs on a human happening to ask the right question at the right moment, you do not have a defense: you have luck.

Why the agent does this

It is not laziness or lack of capability. It is incentives. A language model is trained and guided to produce something that looks complete and satisfies what you asked for — now. The minimum-effort option almost always meets that visible goal: it does less, has fewer parts that can go wrong, and passes the tests you wrote. It is a perfect local optimum.

What it does not optimize is the invisible part: what happens in the states you did not enumerate, at 10× load, when a peripheral component fails at 3 a.m. Architectural correctness lives precisely there — in the failure modes nobody mentioned in the prompt. A next-token predictor, completing the happy path, underweights exactly that.

And there is an uncomfortable detail: the agent has no skin in production. Nobody is going to wake it up when the report gets deleted. It optimizes the conversation, not the life of the system. It belongs to the same family as sycophancy — giving the answer that looks good and gets accepted today, not the one that holds up tomorrow.

And it is not just a metaphor. Anthropic's research on reward hacking documented models that learn to call sys.exit(0) so the test harness reports success without solving the task — "the equivalent of writing A+ on your own exam". They optimize the visible signal, not the goal.

The pattern we had been seeing

That sweep was not an isolated incident; it was the umpteenth face of the same pattern.

First, the “70% wall”: the agent takes you to 70% in an afternoon, the prototype looks great, and the remaining 30% —the part that withstands reality— takes longer than the entire first version. It is easy to celebrate the prototype and confuse “works in the demo” with “ready”.

Days later, again, in another form: a “durable” outbox that stored the intent and executed the effect in a single transaction. If the process died in the middle, everything rolled back and no trace was left to retry. The recovery test passed green… because it inserted by hand the post-crash state that production could never generate. A test that fabricates the scenario it wants to prove proves nothing.

Different symptom, same disease: the minimum-effort option passes every test, and is still architecturally poor.

And it is not anecdotal. A 2026 study (Singapore Management University) analyzed more than 300,000 AI-written commits across more than 6,000 repositories: over 15% introduced at least one problem, and 22.7% of those problems were still alive in production in the latest version of the repository. The minimum-effort option does not evaporate at merge — it accumulates as debt that someone pays later.

Why our four layers did not catch it

Our factory has four layers of defense: CLAUDE.md (guidance), an independent reviewer, deterministic hooks and permissions.deny, and CI. They are excellent at hunting correctness bugs. But they had a precise blind spot.

The reviewer compares the code against the spec. If the spec itself is the minimum-effort choice, the reviewer faithfully approves a bad design. The tests validate behavior against a golden dataset; the bad design passes too. The gap is not in the code — it is upstream, in the design decision, before the spec is written. None of the four layers looks there. This time it was caught by a person with a question, not by a mechanism.

Worse: you do not even feel it. In a controlled trial by METR, expert developers believed AI made them 20% faster; measured, they were 19% slower — almost a 40-point gap between perception and reality. If you cannot trust your own sense of progress, you need a mechanism — not intuition, and not luck.

Two uncomfortable truths
  1. “Green” is not evidence of architectural soundness. It is evidence of correctness. The poor option also passes every test.
  2. Code review cannot save a minimum-effort spec. By the time the reviewer looks, the poor decision is already written down as a contract. The defense has to sit before the spec.

How we solved it

Two moves: one behavioral, one structural.

Behavioral and immediate. For any non-trivial design decision, the agent must explicitly name the two options —the minimum-effort one and the architecturally correct one— and justify the choice by answering “what breaks at 10× or in production?”. Without waiting for someone to ask. The question that unlocked the case becomes a mandatory step of the plan, not a stroke of luck.

Structural. An Altitude Gate: an adversarial review of the design, done by a fresh agent (different from the one that designed it), upstream of the spec, with a single mandate — hunt minimum-effort poverty, not incorrectness. It is a new defense placed exactly at the layer where the gap was.

And here comes the part that taught us the most. When we built that gate, its first version let the main agent —the biased one— decide case by case whether a decision was “structural” and deserved the extra pass. The adversarial review of the gate's own design flagged it as critical: if the biased party can classify something as “not structural” to spare itself the work, the mechanism never fires, nobody finds out, and the omission is reintroduced one layer up — exactly the failure it claimed to close. The fix: the trigger stopped being a judgment and became a closed, enumerated list (matching, not criteria), with the rule “when in doubt, it is structural”.

The best proof that the mechanism worked was applying it to itself — and having it find its own flaws.

The lesson behind the lesson: when you build a guardrail against an agent's bias, audit every control point inside the guardrail. If a bias-sensitive decision is operated by the same biased agent, the guardrail is theater — the bias slips in through its own valve.

What transfers

If you build with agents, this applies beyond our case:

Takeaways
  • Treat “green tests” as necessary, not sufficient. They do not prove design quality.
  • Put a defense at the design stage, before the spec. Code reviews arrive late.
  • Force the agent to expose the trade-off: two options + “what breaks in production”. Do not wait to ask it yourself.
  • Make the trigger of your guardrails matching, not judgment by the biased agent.
  • Independent review: whoever reviews the design is not whoever proposed it.

And we are not the only ones saying it. Kent Beck, TDD pioneer, describes AI as an "unpredictable genie… willing to cheat by deleting or modifying the tests so they pass". Martin Fowler recommends treating each delivery as "a PR from a dubious collaborator, very productive in lines of code" but one you cannot trust. And the distinction is old: Fred Brooks already separated essential complexity (the design) from accidental complexity (the syntax) — and AI attacks, above all, the accidental.

An agent's default is minimum effort. Building well, with agents, means designing the system that does not let it through — and putting it where minimum effort gets decided: in the design, not the code.

What we do behind our own doors is what we sell.

Centro de Verdad separates what looks true from what is verified in AI output, before an organization decides on it. Building it with agents demands the same discipline: separating what looks correct from what is architecturally correct — before the merge.

Centro de Verdad — the verification layer between LLMs and business decisions.
ver-4.com

Sources

  1. Anthropic — From shortcuts to sabotage: natural emergent misalignment from reward hacking in production RL (2025).
  2. Y. Liu et al., Singapore Management University — Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild (arXiv:2603.28592, 2026).
  3. METR — Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (2025).
  4. Addy Osmani (Google) — The 70% problem: hard truths about AI-assisted coding (2024).
  5. Kent Beck — TDD, AI agents and coding with Kent Beck (The Pragmatic Engineer, 2025).
  6. Martin Fowler — Exploring Gen AI (Thoughtworks).
  7. Fred Brooks — No Silver Bullet: Essence and Accident in Software Engineering (1986).
#AgenticCoding #AIEngineering #Evals #Architecture #TrustworthyAI