Writing
Beyond the vibing
A lot of agentic-coding workflows still rely on the developer reading every diff to confirm an agent did the right thing. That works at small scale and breaks the moment you let an orchestrator run features end-to-end. The piece I've been working on — and the focus of this post — is closing that gap with an executable contract: a tiny test harness over my hand-rolled Redux store that lets the agent verify its own state transitions before it ever asks me to look at the code.
// the problem with prompt-driven development
When I first started giving AI agents real implementation work, the workflow was conversational. I'd talk through a task, review every file the agent touched, check every state mutation against my Redux patterns, and verify dependency injection by hand. It was effective in the small, but it didn't scale: every session re-opened with the same architectural briefing, and any drift from the patterns went unnoticed until it shipped.
The underlying issue wasn't the agent — it was that I hadn't given it a contract. I was relying on context windows to carry architectural intent, and that's a fragile substrate. What was missing was something the agent could check its work against without me in the loop.
// borrowing from TCA, without the framework
The Composable Architecture from Point-Free has the cleanest answer I've seen to this problem. Their TestStore turns testing from an afterthought into the central organising principle of a feature: send actions, await effect emissions, assert state, and have the runtime fail loudly when anything is unaccounted for.
I didn't want to adopt TCA wholesale. The app already runs on a small hand-rolled Redux implementation — three protocols (Action, Middleware, Store), pure reducers, fire-and-forget async middleware. Adopting a framework would mean rewriting the middleware chain and pulling in a substantial dependency. What I wanted was the pattern — the verification surface — adapted to the runtime I already have.
// the redux lite
A quick tour of the store, because the simplicity is what makes the rest of this possible:
// The three core protocols
public protocol Action: Sendable {}
public protocol Middleware<State>: AnyObject {
associatedtype State
func process(action: any Action, state: State,
dispatch: @escaping @MainActor (any Action) -> Void) -> any Action
}
@MainActor
public final class Store<State>: ObservableObject {
@Published public private(set) var state: State
private let reducer: (State, any Action) -> State
private var middleware: [any Middleware<State>]
}
Actions are enums with associated values. Reducers are pure free functions. Middleware handles side effects — network calls, database writes, playback — by observing actions and optionally dispatching new ones through a fire-and-forget closure. A typical middleware looks like this:
// A middleware side-effect pattern (simplified)
func process(action: any Action, state: State,
dispatch: @escaping (any Action) -> Void) -> any Action {
guard let musicAction = action as? MusicPlayerAction else { return action }
if case .trackComplete = musicAction {
Task {
await submitListen(track, listenedAt: now)
dispatch(.trackAdvanced(trackId: next))
dispatch(.fetchListens)
}
}
return action
}
This pattern has been running cleanly in production for months. The agent — whether Claude Code or the orchestrator I described in my previous post — handles it well as a generator. What was missing was a way to verify that the generated code did the right thing.
// what the orchestrator changed
With a multi-agent orchestrator — research, plan, implement — the unit of work scaled up. Instead of "add this validation to the login form," I could hand off "add offline support for the search tab, following existing cache patterns, with the same error-handling strategy as the home feed." The orchestrator could research the cache infrastructure, plan the integration points, and implement the feature without supervision.
But I still had to read every line the next morning. The agent could build a feature end-to-end; I had no way of knowing whether the state transitions were correct without manually tracing every code path. The throughput went up, but verification stayed manual — which meant verification became the bottleneck.
What I needed was a test suite the plan agent could produce alongside its design, the implement agent could run as part of its loop, and I could review as a small surface area in place of reading the whole implementation.
// the dispatch observer
The key insight from TCA's TestStore is that you need visibility into what the store is doing. In TCA, effects are structured — the runtime knows about them and awaits their completion. In my Redux lite, effects are fire-and-forget Task closures. The store has no idea what's in flight.
The fix turned out to be a single observer hook in Store.dispatch(_:):
public func dispatch(_ action: any Action) {
dispatchObserver?(action) // ← the only change
let isNested = dispatchDepth > 0
dispatchDepth += 1
// ... existing middleware chain and reducer logic unchanged ...
}
The dispatchObserver is an optional callback — nil in production, set to a collecting function in tests. Every dispatch, top-level and nested, records itself into a receivedActions queue. With that one line, the TestStore has a complete window into every action flowing through the system.
// send, receive, finish
The TestStore API has three core methods, and one important detail: failures route through Swift Testing's Issue.record with the source location of the call site, so a missed assertion fails the test the same way an #expect would.
public final class TestStore<State: Equatable> {
var exhaustivity: Exhaustivity = .on
func send(_ action: any Action,
assert: ((inout State) -> Void)? = nil) async
func receive<A: Action & Equatable>(_ expected: A,
assert: ((inout State) -> Void)? = nil) async
func finish() async
}
send dispatches an action and asserts the immediate state change. The trailing closure receives the state before the action and you mutate it to match the expected state after. The detail that matters: middleware fires Task closures that dispatch follow-up actions, so I snapshot state immediately after the synchronous reducer/middleware chain runs but before yielding to the async runtime. That gives a clean state transition for the dispatched action itself, uncontaminated by subsequent effect emissions.
receive is where the async story gets handled. It blocks (with a configurable timeout) until a matching action appears in the queue, pops it, and asserts the resulting state change. There are two overloads: a typed one for the common case (receive(SearchAction.searchResponse(results: [...]))) and a predicate form for matching on a condition rather than equality.
finish waits for the receivedActions queue to drain. In exhaustive mode (the default), any unhandled actions at test end produce a failure with a recursive field-level diff showing exactly which actions went unasserted. The exhaustivity flag is lifted directly from TCA: .on means you must assert on every received action; .off is for integration tests where you only care about end-state.
// a real example
A search feature, end to end. The middleware takes a query, debounces, and dispatches a response action:
@Test func searchFetchesResults() async {
let store = TestStore(
initialState: SearchState(),
reducer: searchReducer,
middlewares: [SearchMiddleware(mockResults: ["A", "B"])]
)
await store.send(.search(query: "cool")) {
// immediate sync state change from the reducer
$0.query = "cool"
$0.isLoading = true
}
await store.receive(SearchAction.searchResponse(results: ["A", "B"])) {
// assert the final state after receiving results
$0.results = ["A", "B"]
$0.isLoading = false
}
}
A complete async flow asserted in three phases: the user action, the middleware effect, and the resulting state. If the middleware dispatches an action you didn't expect — say, a logging side effect — exhaustive mode catches it and tells you exactly what was missed.
// how the agent uses it
This is the part that changes the workflow. The TestStore isn't primarily for me to write tests by hand — it's for the agent to verify its own work.
When the orchestrator's plan agent designs a feature, part of its output is a set of test cases: the expected state transitions for every action the feature handles. Those tests get reviewed (a few dozen lines of structured assertions, easy to read) before any implementation begins. The implement agent receives them as part of its context, builds the feature, runs the tests, and uses the failure output — exact field-level diffs — to fix its own implementation iteratively. Only when all tests pass does the feature come back to me for review.
┌───────────────────────────────────────────────────┐
│ plan agent │
│ → architectural design │
│ → test contract (expected state transitions) │
│ │
│ human review gate │
│ → approve the plan + tests │
│ │
│ implement agent │
│ → builds the feature │
│ → runs test contract → receives diffs │
│ → fixes failures iteratively │
│ → all tests pass → feature verified │
└───────────────────────────────────────────────────┘
The human touchpoint moves earlier and shrinks. Instead of reading a full implementation in the morning, I read the plan and the tests, approve or modify them, and let the agent iterate until the contract passes. The tests become the specification, and the specification is executable.
// what the test contract actually proves
These aren't unit tests in the traditional sense. A unit test verifies a single function in isolation. The TestStore verifies behavioural correctness of the state machine over time — which is the level at which a feature actually works or doesn't. It answers:
- When the user performs this action, does the immediate state mutation match the design?
- When the middleware's side effect completes, does the correct follow-up action fire?
- When that follow-up is processed, does the resulting state match expectations?
- Are there any unaccounted-for side effects — actions the middleware fired that we didn't plan for?
- When the feature is complete, have all pending effects resolved to a stable state?
These are the questions I was answering manually every morning by reading diffs. The test contract answers them automatically, and gives the implement agent structured feedback to act on rather than guesswork.
// what I took away
Tests as specification, not verification. When a developer writes tests, they're verification — proving code that already exists. When the plan agent writes tests as part of its design, they're specification — the contract that defines what "works" means before any code exists. The implementation follows the tests, not the other way around. That ordering is the shift.
The observer pattern is low-footprint infrastructure. The dispatchObserver hook is one line, adds zero overhead in production, introduces no new protocols, and creates a complete testing surface for the entire Redux pipeline. The best infrastructure is the kind you don't notice until you need it.
You don't always need the framework to get the pattern. TCA's TestStore is thousands of lines built on a structured effect runtime. I don't have that runtime. But the underlying pattern — send actions, receive effects, assert state, enforce exhaustivity — is implementable in around 200 lines on top of an observer hook. The idea scales down well.
Agents work better with contracts than with instructions. Telling an agent "do it this way" is fragile. Context drifts; architectural intent decays over long conversations. Giving the agent a test contract — "the feature is correct when these 14 assertions pass" — is durable. The contract doesn't decay. The agent reads failures the same way a human would. The implementation can change completely; if the tests pass, the feature is correct.
// where this goes next
The next step is closing the loop fully. Right now the plan agent produces test contracts and the implement agent runs them. The missing piece is making the orchestrator itself aware of test results — feeding failures back into the plan when they suggest the design was wrong, not the implementation. When that's in place, the orchestrator stops being a feature generator and starts being a system that designs, implements, tests, and self-corrects against its own contract. The TestStore is the first piece of that.
— AM, Amsterdam, April 2026