Engineering notes on building AI you can trust — and the factory that produces it.
Why AI agents default to the option that passes the tests — not the one that survives production. And the mechanism we use to catch it before the merge.
Read the postAn agent ignores advice and obeys what is deterministic. The 4 layers — with real configs — for building with agents without breaking production. Paste it into Claude Code and it configures itself.
Retrieving the right passage is not verifying the fact. The unit of truth is the atomic claim bound to its literal source.
A confidence number hides what matters most: how much evidence supports it, how much contradicts it, and how much we simply do not know.
A 73% confidence score tells nobody what to do. A clear verdict, with the evidence one click away, does.
An ERP export, a Slack message and an LLM answer do not deserve the same initial trust. How we assign priors per source.
We build a truth engine with AI agents. That forced us to apply, internally, the same discipline we demand from AI output.
Jira says the 10th; Slack says the 12th. Most systems pick one silently. We flag it — before you decide on the wrong data.
Engineering notes on trustworthy AI. No noise.