I build the guardrails first
What you’ll take away
Why I design the guardrails before the first prompt, and what it actually buys me.
What you’ll take away
Why I design the guardrails before the first prompt, and what it actually buys me.
On most projects there’s an unspoken order of operations. Build the clever part first. Get it working. Then, somewhere near the end, bolt on whatever controls security asks for so the thing can ship. Governance is the toll you pay at the door.
I work the other way around. The guardrails get designed before I write the first prompt, and not because I’m the cautious type. I do it because they’re the only reason anyone ever lets the system run without a person watching it. Take them out and every output needs a human to check it. At that point you haven’t automated the work. You’ve moved it onto someone’s screen.
Four of them go in before anything else, because retrofitting them later always costs more than building around them.
The first one is a contract at the boundary. The model’s output stays untrusted input until it passes a schema. Wrong shape, and it never reaches the next step. That one gate quietly kills most of the “the agent did something insane” stories before they get a chance to happen.
Then there’s failure, because something upstream always breaks eventually. APIs rate-limit, models stall, a dependency times out. A system that’s ready for production already has an answer for that: a fallback to a safe, boring state instead of a stack trace and an engineer getting paged at dinner.
How would you know the output had started drifting? You can’t manage what you don’t measure, so I wire in a signal that catches degrading quality before a customer does. If the first sign of trouble is a renewal call, the monitoring already failed.
The last one is keeping a human on the decisions that actually matter. Not every step, which would just rebuild the bottleneck you were trying to remove. The irreversible, expensive calls get a confirmation surface and an audit trail that still makes sense half a year later.
Teams call all of this overhead. They have it backwards. Once the four are in place you can ship hard, because the cost of being wrong is capped: a bad output dies at the schema gate, a flaky dependency falls back instead of taking the system down with it, and drift turns up on a dashboard instead of in someone’s inbox. Releases stop feeling like a coin flip, and they get a lot cheaper to make.
So when someone calls the guardrails overhead, I point at the release cadence. The teams that wired in the boring parts are the ones shipping every week without flinching. The ones that skipped them are still hand-checking output they were supposed to have automated months ago.