← The Forge Brief
How-To · 8 min read

Building a Deterministic AI Orchestrator

How a deterministic orchestrator — explicit state, hooks, subagents — makes AI coding workflows reliable instead of unpredictable. Lessons from the build.

By Hendrik Lojek
Key Takeaways
  • The model was never the bottleneck — the deterministic architecture around it was.
  • Rules that must hold go in hooks and scripts, not prose; config goes in YAML/JSON, not paragraphs.
  • Keep the agent thin, the skill thin, and the platform fat — and build the software process into explicit phases with gates.

I set out to make an AI coding agent reliable enough to trust with real work. What I actually learned, across a year and five major rewrites of a system I call the orchestration platform, is that the model was never the problem. The architecture around it was.

If you are trying to get an agent to behave deterministically — same inputs, same disciplined process, every time — here is the path I took, the wrong turns included, so you can skip a few of mine.

Lesson 1: Markdown is a suggestion, not an instruction

My first instinct was the obvious one: write all the rules in a CLAUDE.md file. Do this, never do that, follow these steps. It worked about seventy percent of the time, which is another way of saying it failed thirty percent of the time, unpredictably.

The reason is built into the tool. Claude Code injects your CLAUDE.md with a note that says, in effect, this context may or may not be relevant — ignore it if it is not. That is by design: it keeps the model from being derailed by stale instructions. But it means your carefully written rules are soft guidance. The model is free to decide a rule does not apply right now. And there is a hard ceiling underneath it — frontier models reliably follow somewhere around 150–200 instructions before adherence falls off a cliff. Pile more prose into the file and you do not get more compliance; you get less.

That was the first real lesson. If a rule absolutely must hold, it cannot live in a document the model is allowed to ignore.

Lesson 2: Put enforcement outside the model

So I stopped trying to persuade the model and started constraining it. Claude Code has hooks — scripts that fire deterministically before and after tool calls, outside the model's reasoning entirely. A hook does not ask the agent to behave. It blocks the action.

This is the single biggest shift in the whole system. Rules that must hold moved out of CLAUDE.md and into hook scripts: block writes to credential files, block destructive commands, block the orchestrator from editing code when it is supposed to be delegating. The platform now runs twenty-one of these enforcement layers. The model cannot talk its way past them because they are not part of the conversation.

The mental model that emerged: treat the LLM as a powerful but nondeterministic process, and wrap it in a deterministic runtime. The intelligence is probabilistic. The guardrails are not.

Lesson 3: Config belongs in YAML and JSON, not prose

The same lesson has a quieter second half. Even non-enforcement information — thresholds, phase definitions, limits — worked badly as prose. A line like "warn the user when context gets high" is the kind of soft guidance the model rounds off.

So configuration migrated out of markdown into structured files. The context-budget guide written in English became a context-budget.yaml with exact numbers: warn at 75%, soft-block at 80%, hard-block at 85%. Phase rules became JSON the hooks read directly. The difference is decisiveness. Prose invites interpretation; a YAML threshold is a number a script compares against. Markdown for humans to read; YAML and JSON for the machine to obey.

Lesson 4: Build the software process into phases

A reliable developer follows a sequence — understand, plan, build, verify, ship — and does not skip the boring parts under pressure. An agent will absolutely skip them unless the sequence is structural.

So I built the software development lifecycle into an explicit phase machine. It started at sixteen phases, which was too granular; I consolidated to twelve, then to six: setup, discovery, design, build, verify, deliver. Each phase has entry conditions, exit gates, and a defined handoff to the next. A bug fix can skip phases it does not need; a new subsystem runs the whole sequence. The point is that "did you actually test it" is no longer a question the model answers honestly or not — it is a gate that has to be cleared.

Lesson 5: Thin agent, thin skill, fat platform

The biggest architectural reversal was about where intelligence should live. The instinct is to make the agent smart — a long, detailed prompt that knows everything. That is exactly backwards. Long-context agents degrade: the more you stuff in, the more the model's attention thins out and the late instructions get ignored.

The pattern that worked is the opposite. Keep the agent thin and disposable — a short prompt, a narrow scope, no memory. Put all the durable intelligence in the platform: the skills it loads just in time, the hooks that constrain it, the state files that persist across sessions. A worker agent is spawned, does one bounded job, and is thrown away. State never lives in the agent's head; it lives in files the platform owns. Agents are cattle, not pets.

This also fixed the role problem. Early on, the system suggested that the coordinator should delegate and the workers should implement — and the model ignored it whenever convenient. An outside analysis of this exact class of system, Praetorian's write-up on deterministic AI orchestration, named the gap precisely: the architecture suggested role separation but did not enforce it. The fix was to make it physical. A coordinator literally cannot write code — a hook blocks it. A worker literally cannot spawn another worker. Role separation stopped being advice and became a property of the runtime.

What it adds up to

None of this made the model smarter. It made the system around the model trustworthy. The throughline is the same one I apply on a factory floor: you do not get reliability by asking people to be careful — you get it by designing a process where the careful path is the only path. Variation is the enemy; the structure removes it.

For anyone building in this space, the compressed version: rules that must hold go in hooks and scripts, not prose. Config goes in YAML and JSON, not paragraphs. The process goes into explicit phases with gates. The agent stays thin, the skill stays thin, and the platform stays fat. And the model gets treated for what it is — a brilliant, nondeterministic engine that becomes dependable only when you build a deterministic machine around it.

That last part is the whole job. The intelligence was never the hard part. The orchestration was.

Sources
Praetorian — Deterministic AI Orchestration: A Platform Architecture for Autonomous DevelopmentAnthropic — Claude Code best practices (CLAUDE.md, instruction adherence)Anthropic — Agent Skills overview