Configured, not coded. The engineering discipline gap in agent development

Cobus Greyling
May 15, 2026

We replaced thousands of lines of orchestration code with a paragraph of English, and quietly walked away from the rigour that used to come with it.
I have been thinking about this for a while, and a thread keeps surfacing across the agent posts I have written this year…
The integration layer collapsed…CLI replaced MCP for most everyday tasks, because the bridge to the tool already existed.
The orchestration layer collapsed…prompts replaced frameworks, and multi-agent coordination that used to require hundreds of lines became a paragraph of English in a CLAUDE.md file.
Now the development environment is collapsing, the IDE is becoming optional, because an AI Agent does not need a file tree, a debugger, or syntax highlighting to work.
Each collapse is a win. Less infrastructure. Less brittle plumbing. Faster iteration.
But there is a side-effect I keep seeing in the wild, and it is the part of this story we are not honest enough about.
What I mean by configured, not coded
A universal agent is one loop, configured.
Same model. Same tool harness. Same execution scaffold.
What changes between write a fibonacci function and triage a bug across the rep is not the code, it is a system prompt, a tool list, maybe a markdown file with instructions.

Markdown is the new programming surface. CLAUDE.md, AGENTS.md, system prompts, skill files, memory entries …these are the levers that decide whether the agent ships value or wastes tokens.
And here is what I find interesting. Nobody calls this code, so nobody treats it like code.
Discipline gap
When this was Python, we had habits.
Version control.
Code review.
Tests.
Diffs that humans actually read line by line.
Rollback when a change broke something downstream.
A change-log that was a real ledger, not a vibe.
When the same logic moves into a markdown file, those habits quietly get dropped.
A 200-line system prompt edited in a chat window with no diff review.
A skill file copied between projects with no test of whether it still works.
A memory entry written once and never re-read against the current code it claims to describe.
A tool description rewritten because it felt cleaner, with no measurement of whether the agent now succeeds more often.
The artefact looks like prose, so it gets edited like prose.
But it behaves like code, small wording changes flip success rates, instructions decay across turns, and a confident edit can quietly regress tasks you were not measuring.
Configuration is code in a different costume.
The discipline gap is not a configuration problem. It is an engineering problem we stopped solving the moment the file extension changed.

Where the gap shows up
Three places I see it most often.
The system prompt as the dumping ground.
Every new failure mode triggers another paragraph of “be careful about X.” The prompt grows. Nobody removes anything.
Eventually the model’s attention thins across a wall of text and the original instructions fade out — the silent killer I have written about before.
No falsifiable change.
A markdown edit ships with no prediction of what it should fix and what it might break.
There is no rollback machinery, because there is no measurement to roll back from.
Most “self-improving” agent setups are really just rationale loops with no defensive prediction.
Prose treated as portable.
A `CLAUDE.md` from one repo gets pasted into another because it worked there.
But the durable parts of an agent harness: tools, middleware, memory structure, travel cleanly.
The discipline needed
The same things we already know how to do, applied to the configuration layer.
Diff every edit.
A prompt change is a code change. Read it line by line.
Predict before shipping.
Write down what tasks the edit should fix and which it might regress. Verify against the next rollout.
Roll back at file granularity when the prediction misses.
Measure what you change.
A markdown edit without a before/after eval is a vibe.
Treat tools, memory, and middleware as the durable IP.
That is the part of the harness that travels. The prose is the part that does not.
Prune ruthlessly.
Every paragraph in a system prompt is competing for attention with every other paragraph. More words is not more discipline.
The model is rented, the harness is owned
This is the line I keep coming back to.
The frontier model you call this quarter will not be the one you call next quarter.
The harness…the loop, the tools, the markdown that configures behaviour, is the artefact you carry forward.
If we treat that artefact as casual configuration, we lose the engineering rigour that made the previous generation of software trustworthy.
If we treat it as code in a different costume, we get the speed of configuration with the discipline of engineering.
The collapse is real. The discipline gap is the bill that comes with it.













.webp)