Writing
Building an app feature with your own AI agent team
Everyone's talking about AI coding assistants. But what happens when you stop treating the AI as a single chat window and start treating it as a team? I built a multi-agent orchestrator inside pi agent that spawns specialised sub-agents — each with its own role, model, and context — and wires them into a single pipeline. Here's how it works.
// what is pi agent?
Pi agent is a minimal, terminal-based coding agent built by Mario Zechner. Unlike monolithic AI IDEs, pi is aggressively extensible — features aren't baked in, they're built as TypeScript extensions. It supports 15+ providers and hundreds of models, so you can pick the right brain for the right job. Think of it less as a product and more as a runtime for AI-powered dev workflows.
The key insight: pi doesn't try to be everything. It gives you session management, context engineering, and an extension API, then gets out of the way. That philosophy is what made it possible to build the orchestrator I'm about to describe.
// the problem with one-agent-does-everything
When you throw a feature request at a single AI agent, you're asking one model to simultaneously be a researcher, architect, and implementer. It works for small tasks. For anything non-trivial — say, adding a new feature that touches multiple files, needs API research, and requires architectural decisions — the context window becomes a battlefield. The agent forgets earlier research, makes contradictory design choices, or just burns tokens re-reading files it already saw.
The fix isn't a bigger context window. It's division of labour.
// enter the orchestrator
The orchestrator is a set of pi agent extensions that implement a multi-agent workflow. You describe what you want built, and instead of one agent fumbling through the whole process, the orchestrator breaks it down and spawns purpose-built sub-agents for each phase.
The pipeline looks like this:
$ task "add offline caching for the feed endpoint"
┌──────────────────────────────────────────────┐
│ orchestrator │
│ ├─ research agent → gather context │
│ ├─ plan agent → design the approach │
│ └─ implement agent → write the code │
└──────────────────────────────────────────────┘
Each sub-agent gets its own fresh context, its own system prompt, and — crucially — its own model selection. The output of each phase feeds into the next as structured context, not as a growing chat log.
// the research agent
First in the pipeline. Its job is to explore — read the codebase, understand the existing architecture, find the relevant files, check for patterns, and surface anything the later agents will need. It doesn't write code. It doesn't make decisions. It just gathers intel and produces a structured brief.
This is where model selection matters. Research doesn't need the most expensive model — it needs one that's fast, good at reading comprehension, and cheap enough that you don't flinch when it reads 40 files. A smaller, faster model works perfectly here. It's essentially doing grep with understanding.
// research agent output (simplified)
{
"existing_patterns": "repository pattern with protocol-based DI",
"relevant_files": ["FeedRepository.swift", "NetworkService.swift", ...],
"dependencies": "already using CoreData for user prefs",
"constraints": "feed response is paginated, ~2kb per item"
}
// the plan agent
Takes the research brief and designs the implementation approach. This is the architect — it decides which files to create, which to modify, what the data flow looks like, and where the tricky bits are. It outputs a structured plan that's specific enough to implement but doesn't contain actual code.
For planning, you want the smartest model you can get. This is where architectural reasoning happens — trade-offs, edge cases, integration points. The plan agent sees the full picture from the research phase and makes the calls that the implementation agent will follow. Skimping on model quality here means your implement agent builds the wrong thing really efficiently.
// plan agent output (simplified)
{
"approach": "CoreData cache layer behind existing repository protocol",
"new_files": ["FeedCache.swift", "FeedCachePolicy.swift"],
"modified_files": ["FeedRepository.swift"],
"steps": [
"1. define cache entity matching FeedItem model",
"2. implement cache-first read with TTL expiry",
"3. background refresh on cache miss or stale",
"4. wire into existing repository via protocol extension"
],
"risks": ["pagination state across cache boundary", "thread safety"]
}
// the implement agent
The builder. It receives both the research context and the plan, then writes the actual code. Because it isn't wasting tokens on exploration or architecture, its entire context window is dedicated to implementation. It knows exactly what to build, where to put it, and what patterns to follow.
Model selection here depends on the task complexity. For straightforward implementations following a clear plan, a mid-tier model handles it fine. For complex tasks with nuanced logic — the kind where you'd want a senior dev — you bring in the heavy hitter.
// model selection: the right brain for the job
This is one of the most underrated aspects of multi-agent workflows. When you use a single agent, you're forced to pick one model for everything — either you overpay for research with an expensive model, or you get mediocre planning from a cheap one. With the orchestrator, every phase gets the model that fits:
// model assignment per agent role
research → fast + cheap // volume reading, pattern matching
plan → smartest available // architectural reasoning
implement → balanced // code generation, follows plan
Because pi agent supports swapping models per-session (and per-extension), the orchestrator can spin up each sub-agent with a different model configuration. The research agent might run on a fast, affordable model while the plan agent gets the most capable model available. You're not paying top-tier prices for find . -name "*.swift" equivalents.
// how it fits together
The orchestrator itself is the glue. It's a pi agent extension that:
- Parses the task description
- Spawns the research agent as a sub-agent with its own context and model
- Collects the research output and validates it
- Spawns the plan agent, injecting the research as structured context
- Collects the plan and presents it for approval (optional gate)
- Spawns the implement agent with both research and plan as context
- Collects the implementation output
Each transition is a clean handoff. The sub-agents don't share a conversation — they share structured data. This means the implement agent doesn't inherit 50 turns of research exploration cluttering its context. It gets a clean brief and a clear plan.
The optional approval gate after planning is important. It's where you, the human, get to review the architectural approach before any code is written. Cheap to change a plan. Expensive to rewrite an implementation.
// what I learned
Context isolation is everything. The single biggest win isn't model selection — it's giving each agent a clean, focused context. Research agents that don't carry implementation baggage. Planners that aren't distracted by code syntax. Implementers that aren't confused by earlier exploration dead-ends.
Structured handoffs beat chat logs. Passing JSON briefs between agents is dramatically more reliable than hoping a model remembers what it said 40 messages ago. It's the difference between a team that writes specs and a team that just shouts across the room.
Model selection is cost engineering. Running every phase on the most expensive model is like hiring a principal engineer to do code review, architecture, and update the README. Match the model to the cognitive load.
The orchestrator pattern is fractal. Once you have research → plan → implement working, you can nest it. A complex feature becomes multiple sub-tasks, each getting its own research-plan-implement cycle. The orchestrator can orchestrate orchestrators. Turtles all the way down.
// try it yourself
If you're already using AI agents in your dev workflow and you've hit the ceiling of what a single agent can do, this pattern is worth exploring. Pi agent makes it particularly natural because of its extension-first architecture — you're building on a runtime designed for exactly this kind of customisation.
The tools are there. The models are there. The missing piece is workflow engineering — treating AI not as a chat window but as a team you can architect, assign roles to, and orchestrate. That's where the real leverage is.
— AM, Amsterdam, April 2026