Writing
My agentic coding stack: what I pay for and why
Everyone posts their dotfiles. Nobody posts their invoice. Here's a breakdown of the subscriptions and services I actually pay for to do agentic coding every day — what each one costs, what it's good for, and where it falls short. Total monthly outlay: roughly €200. The ROI? At least 2–3x on a normal freelance day rate, and that's conservative.
// the short version
I run four services in my agentic workflow. Two are daily drivers with paid subscriptions, one is a pay-as-you-go credit system for experimentation, and one I've mostly replaced with something I built myself. Here's the stack at a glance:
┌─────────────────────────────────────────────────────┐
│ agentic coding stack │
│ │
│ Claude Max €110/mo → daily tweaks & amends │
│ Ollama Max €90/mo → heavy lifting & overnight │
│ OpenRouter credits → model experimentation │
│ Perplexity credits → replaced by pi web search │
└─────────────────────────────────────────────────────┘
// claude (anthropic) — the defaults guy
I use Claude Code on a daily basis. It's my "make small amends and tweaks" tool — fast enough, reliable enough, and it doesn't require any configuration. You install it, you auth, you go. That frictionless onboarding is genuinely valuable when you just need to ship something without thinking about model selection or provider routing.
I'm on the Max plan — the 5x tier at $100/month (roughly €110 with tax). That gets you 5x the usage of the standard Pro plan per 5-hour session window, priority access during peak hours, and early feature rollouts. The 20x tier at $200/month exists but I haven't found I need it — the 5x allocation gets me through a normal working day most of the time, and when it doesn't, I'm probably better off reaching for Ollama anyway.
The good: It just works. No config, no provider selection, no API key management. Claude Code understands my codebase, reads the right files, and makes competent edits. For focused, bounded tasks — rename this, fix that test, add this validation — it's genuinely fast. The integrated experience of Claude desktop, mobile, and Claude Code in one subscription is clean. And Anthropic's Opus 4.6 model remains one of the most capable frontier models for nuanced code reasoning.
The bad: Three things regularly frustrate me. First, status issues — there are days when Claude's capacity is stretched and you get throttled or degraded performance. Second, non-transparent quotas — Anthropic doesn't publish hard token counts or message limits for Max plans. They use a dynamic system based on "usage per session," which means you never quite know where the ceiling is until you hit it. Third, and this is the big one: you can't use the models with your subscription outside of Claude Code. That €110/month doesn't buy me API access to Opus or Sonnet. If I want to route Claude models through my pi agent harness, I need a separate API key billed per token. The subscription and the API are entirely separate.
That last point is more than a pricing gripe — it's an architectural constraint. When your subscription only works inside Anthropic's surfaces, you're locked into their workflow, not yours.
// claude max — the verdict
+ zero config, works immediately
+ strong model quality (opus 4.6, sonnet 4.6)
+ desktop + mobile + code in one sub
- opaque usage quotas
- occasional capacity issues
- subscription ≠ api access (walled garden)
€110/month | role: daily driver for small tasks
// ollama — the workhorse
If Claude is my defaults option, Ollama is my strategic one. I use it daily and I run it overnight. It's the driver behind my pi agent harness — the thing that's actually building features, running multi-step workflows, and doing the heavy lifting while I sleep.
Ollama's value proposition has shifted dramatically over the last year. It started as a tool for running local models — pull a model, serve it on localhost, get an OpenAI-compatible API. That's still there, and it's still valuable. But the real game-changer is Ollama Cloud: hosted inference for frontier open-weight models at subscription rates with transparent usage quotas. No per-token billing surprises. No API key juggling. You subscribe, you run models, you see your usage.
I'm on the Max plan at $100/month (roughly €90) — 5x the Pro usage, up to 10 concurrent cloud models, and enough headroom to run sustained agent sessions over extended periods. My honest estimate: this subscription delivers at least double the productivity of my Claude subscription. Part of that is the model quality, part is the architectural flexibility, and part is that I can use the models however I want — through pi agent, through curl, through any tool that speaks the Ollama API.
My go-to models: GLM 5 and Gemma 4, depending on task complexity. This might surprise people who assume the only viable models come from closed-source labs, so let me explain.
GLM 5.1 is a mixture-of-experts model from Z.ai with 744B total parameters and 40B active. It scores 92.7% on AIME 2026 I, 86.0% on GPQA-Diamond, and 77.8% on SWE-bench Verified. For my daily agentic coding work — multi-file refactors, feature implementation, complex code reasoning — I find it as usable as Anthropic's Opus 4.6. That's not a benchmark claim; it's a workflow claim. The model handles multi-step tool-calling reliably, reasons well over long contexts (128K+ with DeepSeek Sparse Attention), and doesn't fall apart on the kind of structured output that agentic workflows demand.
Gemma 4 is Google DeepMind's open model family — I typically run the 26B MoE variant (4B active, 256K context) for lighter tasks and the 31B dense model when I need more reasoning depth. Gemma 4 scores 85.2% on MMLU Pro at the 31B tier, 89.2% on AIME 2026, and has native function-calling support baked in — crucial for agentic use. The MoE architecture is particularly efficient: the 26B model only activates 3.8B parameters per token, meaning it's fast and cheap to run while still punching well above its weight class. For research and exploration tasks through my agent, it's my starting point.
The broader trend here is open models closing the gap. Six months ago, the difference between the best open models and the best closed models was meaningful. Today, for the coding and reasoning tasks that define my daily workflow, it's marginal. GLM 5 and Gemma 4 aren't "good for open source" — they're good, full stop. And they're available through a transparent subscription with usage metrics I can actually see, unlike Claude's hand-wavy "session-based" allocation.
The good: Transparent subscription usage. Far greater flexibility for routing which model does what task. The ability to use models through any tool — my pi agent harness, curl, custom scripts — not just Anthropic's walled garden. Cloud models served at native weights on NVIDIA hardware. A growing library that's genuinely competitive with closed-source offerings. And perhaps most importantly: prompt and response data is never logged or trained on.
The bad: I'd love to run more local models, but my 64GB Mac Studio has its limits. The 31B dense Gemma 4 model fits — barely — but anything larger and I'm firmly in cloud territory. The local model experience on a consumer Mac is still constrained by unified memory bandwidth. If you want to run frontier-class models locally, you need serious GPU hardware or accept the cloud dependency.
// ollama max — the verdict
+ transparent quotas (i know exactly what i'm spending)
+ model flexibility (glm-5, gemma 4, many more)
+ works with any tool, not just ollama's ui
+ open models rival closed-source quality
+ no training on prompt/response data
- local model limits on consumer hardware
- cloud-only for the really big models
€90/month | role: heavy lifting & overnight runs
productivity: ~2x vs claude subscription
// openrouter — the laboratory
OpenRouter is a unified API gateway that aggregates 300+ models from 60+ providers through a single endpoint. It's OpenAI-compatible, so you point your existing SDKs at it, swap the base URL, and you're routing requests to any model on the platform.
I keep credits on OpenRouter and have OpenCode set up to switch between models. But I don't use it as a daily driver — I use it as a laboratory. When a new model drops — a new DeepSeek variant, a Kimi release, a Llama iteration — I spin it up through OpenRouter first, test it against real tasks in my codebase, and decide whether it earns a spot in my Ollama workflow.
The pay-as-you-go model makes this sustainable. You buy credits, use them per-token at provider rates, and OpenRouter takes a 5.5% platform fee on credit purchases (5% for crypto). There's no subscription commitment, no minimum spend, and they pass through provider pricing without markup on inference for most models. Credits don't expire for a year. The economics are straightforward: you're paying a small premium for the convenience of unified billing and instant access to every model under the sun.
One notable wrinkle: Anthropic models carry a significant markup on OpenRouter — some historical benchmarks put Claude 3.5 Sonnet at 100% over direct API pricing. For newer models like Opus 4.6 ($5/$25 per million tokens), pricing appears to pass through at or near cost. But if Claude models are a significant part of your workload, this is worth verifying against direct Anthropic API pricing. For my use case — experimentation, not production — the markup doesn't matter much. I'm testing, not running 50K requests.
I use OpenCode as my client for OpenRouter. OpenCode is an open-source terminal coding agent (MIT license, 140K+ GitHub stars) that supports 75+ LLM providers. It has a TUI with Plan mode and Build mode, LSP integration for real code intelligence, and a client/server architecture that lets you run sessions in Docker containers. The key advantage: I can swap between OpenRouter models without reconfiguring my environment, and the open-source stack means I'm not locked into any vendor's workflow.
// openrouter — the verdict
+ instant access to 300+ models
+ unified billing across 60+ providers
+ excellent for model evaluation
+ openai-compatible api
~ 5.5% platform fee on credit purchases
- some models carry markup vs direct api
pay-as-you-go | role: experimentation & evaluation
// perplexity — replaced, mostly
I also keep credits with Perplexity — the AI-powered research engine that delivers citation-backed responses by searching the web in real-time. The Pro plan sits at $20/month (or $200/year), giving you access to models like GPT-5.2, Claude Sonnet 4.5, and Gemini 3 Pro alongside Perplexity's own Sonar models. The value proposition is clear: instead of searching and synthesizing yourself, Perplexity does the web research legwork and gives you sourced, verifiable answers.
But here's the thing: for my workflow, Perplexity has been largely replaced by the Ollama web search pi agent extension.
I built a web search extension for my pi agent harness that routes search queries through Ollama's cloud models and returns structured, citation-backed results — directly in my coding environment, without context-switching to a separate app. The research agent in my multi-agent pipeline (which I wrote about in my previous post) uses it to gather context before the plan and implement agents do their work. The result: I rarely need to leave my terminal to do web research anymore.
Perplexity still wins on one thing: its Sonar Deep Research capability for comprehensive, multi-source research reports. If I need a deep dive on a new framework, a competitive analysis, or a literature review — the kind of thing where you want 15+ sources synthesized into a structured document — Perplexity's Deep Research produces better output than what I can get from a quick agent query. The new Search API ($5 per 1,000 requests) and Sonar API (per-token pricing plus request fees) also make it viable for programmatic integrations where you want Perplexity's citation infrastructure baked into your own tools.
But for day-to-day "what's the API signature for this?" or "what's the current recommended way to handle X?" — my pi agent web search handles it. Perplexity's credits aren't wasted; they're just reserved for the deep research cases where a purpose-built search engine still outperforms a general AI with a search tool.
// perplexity — the verdict
+ best-in-class citation-backed web research
+ deep research for comprehensive reports
+ api available for programmatic use
- mostly replaced by pi agent web search
- deep research overkill for quick lookups
pay-as-you-go | role: deep research only
// the economics
Let's talk numbers. My total monthly AI tooling spend is roughly €200, split between Claude Max (~€110) and Ollama Max (~€90), with occasional bursts on OpenRouter credits and Perplexity for deep research tasks.
// monthly spend breakdown
Claude Max (5x) $100 ≈ €110
Ollama Max $100 ≈ €90
OpenRouter credits ≈ €10–30 (variable)
Perplexity credits ≈ €5–15 (variable)
─────────────────────────────────────────
Total ≈ €215–245/month
As a freelance iOS developer in Amsterdam billing at €800/day, I need this stack to save me roughly 1.5 hours per working day to break even. In practice, it saves me far more than that. The Ollama-driven agent harness alone — running overnight, implementing features autonomously, ready for review in the morning — effectively turns one developer into a small team. The Claude subscription covers the rapid-iteration work during the day. Together, they're a 2–3x productivity multiplier, and I say that as someone who's been building software for 25 years and has seen plenty of "productivity tools" that never delivered.
The interesting asymmetry: Ollama generates the majority of the productivity at less than half the total spend. Claude Max is the more expensive subscription, but Ollama is where the heavy lifting happens. If I had to cut one, I'd cut Claude and route everything through Ollama. The friction would increase — I'd lose the zero-config experience and need to think about model selection more often — but the core productivity would survive. If I cut Ollama and kept only Claude, I'd lose the ability to run extended autonomous sessions, the flexibility to choose models per-task, and the transparent usage metrics that let me actually understand what I'm paying for.
// what i've learned
Subscription lock-in is the real cost. Claude's subscription doesn't include API access. Ollama's does. That distinction shapes your entire workflow. When your subscription only works inside a vendor's UI, you're building on someone else's platform. When it works through an open API, you're building on yours.
Open source models are ready. If you're still assuming that closed-source frontier models are the only viable option for serious agentic coding, you're working with outdated assumptions. GLM 5 and Gemma 4 — available through Ollama's transparent subscription — perform at a level that, in my daily workflow, matches Claude's Opus tier. The gap isn't just closing; for certain tasks (especially multi-step tool-calling workflows), open models with transparent pricing are the better choice.
Transparency in quotas matters more than you think. When you can see your usage — what you've spent, what you have left, how much each session consumed — you make better decisions. You can route expensive tasks to the right model. You can decide whether to keep a long session running or start fresh. Claude's opaque session-based limits force you to fly blind by comparison.
Build your own tools where the gap exists. My Perplexity replacement — a web search pi agent extension — took an afternoon to build and eliminated a €20/month subscription from my daily workflow. When your coding agent is extensible (as pi agent is), the calculus changes. Instead of paying for a separate research tool, you extend the one you already have.
The stack compounds. None of these services exist in isolation. Claude handles the quick fixes. Ollama drives the agent harness. OpenRouter lets me evaluate new models. The pi agent framework ties them all together. The ROI isn't from any single tool — it's from the way they compose into a workflow that's greater than the sum of its parts.
— AM, Amsterdam, April 2026