


Over the last six months, Claude has contributed more code to our iOS codebase than any individual engineer. That didn't happen overnight. At Tolan, AI has always been part of our workflow—last summer one of our engineers was in the top 0.01% of Cursor users in San Francisco—but the real shift came when we moved from autocomplete-driven development to agentic development. Over the second half of 2025, subagents, Skills, and the leap in intelligence from Opus 4.5 each compounded on the last.
Our Swift developers initially got less out of these tools than our TypeScript backend engineers. TypeScript is the #1 language on GitHub; Swift isn't even in the top ten. Less training data plus Swift's rapid evolution meant we'd need to do real work to close the gap. So I set to work.
Every new Claude instance is effectively a new hire—one that has no idea which patterns in the codebase to follow. My first instinct was to spell everything out in our CLAUDE.md, but that's the wrong lever: an agent's context comes primarily from the codebase itself, not its instruction file. So rather than documenting the right patterns, I decided to make the codebase be the documentation by using agents to standardize it.
We already had strong patterns—every new feature starts from a template—but those templates had evolved over time, leaving older code in outdated forms. I found the commits where we'd updated the templates and pointed Claude at the drift. Over a single day and dozens of PRs, Opus 4.1 standardized 100k lines of feature code. Now every file in the codebase demonstrated the patterns we wanted our agentic new hire to replicate.
In August, Claude subagents gave us the next big lever. For those unfamiliar: subagents are executors with custom instructions that accomplish a task without polluting the main agent's context window. We used them to automate context-heavy grunt work—diagnosing build failures, running snapshot tests, pulling new comments from PRs, downloading logs from CI.
Better yet, subagents let us separate solving the problem from writing code in our style. An implementation agent would focus purely on getting working code, then custom review agents would give the main agent specific instructions for transforming that code to match our standards. This build-then-review approach costs more tokens, but it dramatically reduced hand-holding—the implementation agent could think freely, and the review agents could be opinionated without fighting for context.
So we kept pushing. Claude reviewing Claude was great, but we wanted Claude to incorporate reviews from all our agents—iterating with Codex, Cursor’s Bugbot, and its own review agents before ever asking a human for help. I built a “PR Shepherd” agent whose job is to watch an open PR, react to CI failures and code review comments, and keep iterating until the PR is clean. You can see it in action in one of our open source forks here, where Claude went back and forth with Bugbot for hours before flagging me to review the final output.
In October, Claude Skills let us extract the context we’d baked into review agents into standalone documentation that any agent—exploration, implementation, or review—could load just-in-time. The key to making Skills work for us was Plan Mode: we ask Claude to add explicit skill-loading instructions to each step of a plan before executing it. That one habit dramatically improved first-pass output quality.
In late November, Opus 4.5 landed with a step-function increase in intelligence—and all the infrastructure we’d built suddenly compounded. A standardized codebase, a comprehensive skill suite, and agents that could operate independently meant we were ready to let Claude do real work. In December, 30% of our iOS commits had Claude as a co-author. January: 45%. February: 55%.
And the quality hasn’t suffered—it’s improved. Over this period, our crash-free user rate went from 99.6% to 99.9%, runtime errors dropped by 54%, and the share of highly-engaged users doubled. Claude is writing more and more of our code, and we’re shipping a better product because of it.
So where are we going next? We just set up a Claude Code Github Action that creates a triage subagent for every inbound ticket. With MCP access to Linear, Sentry, and Datadog, Claude reconstructs what the user did before a crash, identifies the likely cause, and routes it to the right team. But we’re not done yet: the goal is an always-on teammate that doesn't wait to be asked. One that spots a pattern, identifies the problem, and opens the PR.
If this sounds like a future you want to help create - we’re hiring across the stack. Get in touch :)


