July 7, 2026

My New Best Friend Is ChatGPT

I’ve switched from Claude to ChatGPT as my main AI assistant. For work I’ve been toggling between the two but I only need to pay for one subscription personally. The current Claude subscription just ran out so now my new best friend is ChatGPT…

Meme graphic with the text My New Best Friend Is ChatGPT

The biggest issue with Claude is that they just don’t seem to have enough capacity for what they’re selling. Response times have become quite long and are dominated by the time before any tokens actually get spent, clearly just waiting to be allocated some GPU time. Plus I’ve regularly had problems where they rate limit all responses even though I’m well within my paid for quota. It’s annoying enough that AI plans have completely opaque, and unquantified token allowances, but to not actually be able to use what you pay for is ridiculous. I don’t expect ChatGPT will be perfect, but its certainly a lot faster from my early experience.

The Claude Mac app is also terrible, but I’ve heard very good things about the ChatGPT app, and again early indications are promising. I may still use the codex CLI tool, but the Codex app feels like its at least worth a try. The ChatGPT app already feels far richer and more responsive.

I’ve also started experimenting with omp as a harness (thanks for the tip Rodrigo!). I’m not sure if it’s particularly better or worse at the moment but I like the idea of using a third party harness so I can more easily switch models without having to switch UIs and convert from CLAUDE.md to AGENTS.md etc. It’s slightly surprising that I haven’t found working in traditional IDEs with their AI integrations to work particularly well. That may have improved since I last tried it, and I certainly miss having a really great IDE, but somehow the many parallel AI agents doesn’t fit well with a traditional IDE. There’s very likely a real opportunity for however can really crack the right way to build a great AI-first IDE. Plenty of people are working on it and some look like they have potential but I haven’t seen anything that really feels right to me yet.

So many levels of experimentation in this new world… Still fun!

June 18, 2026

Ads Evading Pi-hole with iOS 27

I YOLO installed the iOS 27 beta and noticed ads creeping back into apps that have been clean for years. The Pi-hole was still running and still showing blocked queries — the ads were just loading anyway.

The culprit was “Connectivity Assist” in the Wi-Fi settings, aka Cellular Fallback. When the Pi-hole blackholes an ad domain the request fails, iOS decides the Wi-Fi must be flaky, and helpfully retries the same request over mobile data — where there’s no Pi-hole in the path. So the ad not only loads, it burns through my data allowance to do it.

Flipped the switch off and ads are properly being blocked again.

June 1, 2026

Moolah Diaries - A Month Later

It’s been a month since I went all in on using Moolah so time to check in on how its going.

Firstly the bugs - there are still some but generally in the functions I’m actually using there aren’t many and things are generally pretty polished. Most importantly regressions are rare so as I work through fixing things it keeps getting consistently better. Most of the issues now are more of the UI polish kind of problem. I’m not a UI designer and the AI is definitely not a UI designer, so getting any level of polish is pretty challenging. Again, the flows I actually use and quite smooth and work well for me. That’s pretty much inline with how the web based version of Moolah worked too - quite nice for me, but very much tailored to me and a bit obscure to everyone else.

What is interesting is the amount of functionality that just gets added but not really used. It’s so easy to come up with an idea and let AI loose implementing it that I have tended to do that. Actually getting it to the point where it really works and I’d actually trust it requires more attention though. Net result is there are a bunch of features that are promising and would make my life easier, but haven’t actually been finished off enough to be used. In fact every so often I throw some idea at the AI and it just points out it is already implemented but not actually hooked up.

There are some definite wins though - Moolah native has multi-currency support that Moolah web never did and it extends to stocks and crypto. So far I’m only using the currencies and stocks support but since I’m paid in USD its nice to be able to track the actual USD amount that is received and then the trade to convert it to AUD accurately. Similarly, stocks are tracked and automatically pull daily prices rather than needing me to grab a snapshot of the total value each day so I have a historical record. Crypto support is there and I think good enough to actually use but having multiple wallets across different chains made it a bit complex to track prior to the account groups functionality that landed recently. The biggest barrier however is how to manage the switch over from one tracking mechanism to the new one. With 10+ years of data it can be a challenge…

Overall it’s been a pretty huge success and I still haven’t read any of the code. For anything that is intended for an audience of one, vibe coding is a perfectly good way to approach it. You need to be prepared to iterate on it with the AI a bit - it’s definitely not just one shot - but you really can build something that can really solve your needs and be nice for you to use. Even without programming experience an advanced user could really use AI to build applications - and AI is way better at building web apps than native so there’s even more potential there. That’s exciting - it’s a return to the more manipulatable, content-creation focussed computing of my youth. In more modern terms content creation has been very possible and a big focus, but its been more on creative output (writing, film, music etc) than on creating computer programs themselves.

Does this replace “real” software development? Could you create software that is good for other people to use? Not with AI by itself but if you know the domain and have at least a decent sense of UI design, then probably. There are certainly apps being created by non-programmers that are seeing good adoption and that doesn’t surprise me. I’m certainly not fearing for my job, and I don’t think most programmers should, but if you’re an average to below average developer just churning out relatively simple stuff you’re in trouble. It’s possible that new developers will find it very difficult to build up the experience they need to get into the industry, but it’s also possible that AI will provide a new pathway to convert power users into professional programmers. A large number of our existing programmers actually got started that way - developing stuff for fun and using that experience to bootstrap careers.

May 30, 2026

A REST API for Ethereum Execution Clients

The JSON-RPC API that Ethereum execution clients expose is genuinely awful to work with. Every call is a POST with a jsonrpc envelope, a method string and a positional params array. Numbers come back as hex strings — block numbers, balances, gas, timestamps, all of it — so even after you’ve wrestled the request together you’re piping the response through something to turn 0x1bc16d674ec80000 back into a number a human can read. It does just about everything it can to be hostile to a quick curl.

Tools like Foundry’s cast make it usable, and I reach for them constantly, but they’re still fiddly for ad-hoc poking and they don’t help at all when something other than your terminal wants to talk to the client. What really bugs me is that we designed such a bad API in the first place and then built a layer of tooling to paper over it.

So, naturally, I built another layer to paper over it. exec-rest-api is a proxy that sits in front of any execution client and exposes a sensible REST API, translating your requests into the JSON-RPC calls the client actually wants.

The Awful API

Here’s what asking for the current chain ID looks like today:

curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
  http://localhost:8545
# {"jsonrpc":"2.0","id":1,"result":"0x1"}

A POST with a hand-written envelope, to ask a question that has no parameters, and the answer is 0x1 rather than 1. Want the block number too? That’s a second round of envelope-writing and another hex value to decode. None of this is hard, exactly, it’s just tedious in a way that adds up every single time.

The same thing through the proxy:

curl http://127.0.0.1:8080/chain
# {"chainId": 1, "networkId": "1", "client": "Geth/v1.13.5...", ...}

A plain GET, no envelope, decimal numbers, and a bit of extra context thrown in for free. That’s the whole idea.

What It Does

The API is organised the way you’d actually think about the chain rather than the way the RPC methods are named. There are resources for /chain, /blocks, /accounts, /transactions, /logs, /traces and /gas, plus a /utils/keccak256 helper for the inevitable moment you need to hash something. Behind each of those it’s still making the eth_getBlockByNumber and eth_getBalance and friends that the client understands — you just don’t have to think about it.

Hex quantity notation is stripped out everywhere, so numbers are numbers. Errors come back as RFC 9457 problem details instead of the JSON-RPC error shape, and anything that returns a list uses RFC 8288 cursor pagination rather than making you guess at ranges. If you do need the raw bytes — a block or transaction as RLP — content negotiation will give you that instead.

There’s a /health resource that’s genuinely useful for orchestration:

curl http://127.0.0.1:8080/health/ready
# {"ready": true, "upstreamReachable": true, "syncing": false, ...}

That single call rolls up “is the upstream reachable” and “is it still syncing” into one readiness check, which is exactly the thing you otherwise end up scripting by hand around eth_syncing.

For the cases where polling is the wrong model, there are Server-Sent Events streams at /streams/blocks, /streams/logs, /streams/pending-transactions and /streams/sync-status. SSE is a much friendlier thing to consume from a script or a small service than the WebSocket subscription dance, and it works with curl directly. There’s also a Prometheus /metrics endpoint and request-tracing headers — X-Request-ID, X-Upstream-Method, X-Block-Height — so when something looks wrong you can see exactly which JSON-RPC call your request turned into.

Running It

The thing I cared most about was that it be trivial to start, because the whole point is ad-hoc use. If you’ve got pipx:

pipx install exec-rest-api
exec-rest-api --upstream-http http://localhost:8545

There’s a plain pip install too, or a single-file .pyz you can download from the releases and run directly:

./exec-rest-api.pyz --upstream-http http://localhost:8545

Or Docker, if you’d rather not install anything:

docker run --rm -p 8080:8080 \
  ghcr.io/ajsutton/exec-rest-api:latest \
  --upstream-http http://host.docker.internal:8545

It works with any execution client — it’s only ever speaking standard JSON-RPC upstream — so point it at Geth, Nethermind, Besu, Reth, whatever you’re running. The same low ceremony means it’s just as happy left running as a long-lived proxy in front of a node as it is spun up for thirty seconds to answer one question.

Is this likely to get any real use? Probably not. The current JSON-RPC is really very entrenched at this point but I do like how easy it is so solve and at least my home node can now expose a nice API. You’ll also note that it differs from the Ethereum consensus client REST API in some conventions because if I’m being overly opinionated about APIs I may as well run with it and do it the way I think the Consensus API should have been done. It didn’t do something as awful as use hex for numbers, but putting version numbers in URLs is almost as bad…

April 28, 2026

Moolah Diaries - Going All In

Three weeks of vibe coding the new native Moolah and I’ve gone all-in. All my data has been converted over to its iCloud backend. moolah-server is still running but I haven’t been updating the data there so its getting more and more out of date. At this point, I’m all in.

As I write this Claude is happily deleting all the moolah-server backend support. Apart from being a big milestone in terms of being committed to the new native implementation, it also marks the first time that I can honestly say that you can use Moolah and I won’t have any access to your data. Previously, data was stored on the server. While I only ever stored a user ID from login, never any name or other personal information, theoretically I could access the data in the database. So while I expect I will remain the only actual user of Moolah, it’s nice to know people could use it without putting their privacy at risk.

The bugs that were once so prevalent are now much more under control. In fact, generally I’d say there aren’t many bugs, but there is a lot of UI that lacks polish or is just downright weird. UI tests are helping to get some of the UI quirks under control, but UI design is definitely not AI’s strength.

The more functional bugs on the other hand have been brought under control through writing a series of guides and review agents - each focussed on a specific area. There’s an iCloud sync reviewer, concurrency reviewer, UI reviewer (good at ensuring everything has accessibility labels, not solving the design problem) and a general code reviewer. Along with an insistence that it fixes every issue raised by those reviewers, the codebase seems to be improving.

Of course, I don’t actually write those guides - I just tell Claude to do deep research on best practices and have it write the guide. Seems to work surprisingly well…

The high level of test coverage also means that once I get it to fix a bug, it generally stays fixed so quality is ratcheting up over time.

The other technique that has been helpful for improving quality is to tell the agent to step back and review the design to come up with a way to really simplify it rather than just keep patching problems. The components that I’ve gone through that process for have tended to stabilise pretty quickly.

That said, progress has slowed down a lot. From the initial report on day 8 it looked like multicurrency support was done. In practice, it’s only become actually usable fairly recently with lots of details missing, particularly with things just not being exposed through the UI. There’s a lot of rework happening too - the data storage has moved from the most automatically managed storage and sync system Apple provides, to doing sync manually and using SwiftData for storage to now working through using SQLite manually (though the first external dependency - GRDB). Nice improvements and nice that such large pieces can be rewritten pretty quickly but it is showing the cost of not having understood everything properly the first time around. Flip side, is we know a lot more about requirements and performance trade offs now that I did at the start which affects many of the choices anyway. It’s a pretty good sign that I feel comfortable doing big migrations and architectural changes though. That’s partly the relative ease with which it can be done - just through the AI at it rather than spending time learning the new framework. Just as much though its a legacy of having multiple backends and spending a lot of time ironing out bugs in the initial iCloud backend - now there is a really great suite of tests that will ensure this rewrite actually works.

All an adventure, all a good learning experience and still a lot of fun.

April 26, 2026

AI Agents Need Sandboxes, Not Permissions

AI agents like Claude Code and Codex try to be “safe” by limiting what the agent can do to a small set of operations and asking the user to approve everything else. In practice, the approval flow is so noisy and the requests so complex that it provides almost no real protection. We’ve reinvented Windows Vista’s never-ending UAC dialogs, and just like Vista, the only thing it actually trains users to do is click through.

The permission model is fundamentally broken. The fix isn’t a better dialog — it’s a sandbox.

The Approval Treadmill

When the agent edits a file, the permission UI can show you the exact diff. Approving that is fine. The problem is that most of what an agent actually wants to do isn’t a clean file edit. It’s a multi-step bash invocation chaining find into xargs into sed, or an inline Python script, or a jq pipeline that the UI truncates after the first couple of hundred characters.

Reviewing those properly takes real time. Reading a 30-line shell pipeline, working out what each step does, checking that nothing is being deleted that shouldn’t be, tracing where stdout is going to end up — that’s not a click, that’s a code review. And the agent fires off another one twenty seconds later.

There are two outcomes. Either you do the careful review every time and spend more time vetting the agent than the agent saves you, or you stop reading and click yes. In practice everyone ends up at “click yes” within a couple of hours of real use (if that). That’s exactly where Vista left users — except now the thing on the other side of the dialog is actively generating novel commands you’ve never seen before.

Sandboxes Over Approvals

The alternative is to give the agent access to a safe but sufficient set of resources up front, and then let it work freely inside that boundary. No prompts, no review of every command — this is the box, do whatever you need inside it.

claude-docker is one way to do this — run the agent in a container so the side effects are contained on a layer you can throw away. There are plenty of other options, many/most of them more polished and with much finer-grained controls over what the agent can reach. The built-in sandbox features in agent executors could work just as well too, but they all seem to default to “ask permission” rather than “deny”, and the sandbox they create is usually too small to actually get work done. So you end up turning the protections off or whitelisting things case by case, and you’re back where you started.

The bar to set is: everything the agent has access to should be safe enough that you’d happily run it under --dangerously-skip-permissions without a second thought. And the sandbox needs to be big enough that the agent rarely runs out of room to operate. When it does hit the wall, doing the missing step yourself outside the sandbox should be fine — because the alternative, loosening the sandbox so it doesn’t have to ask, defeats the whole point.

Hooks Aren’t a Security Boundary

A tempting middle ground is to use hooks to inspect the bash commands the agent is about to run and reject the dangerous ones. This seems like a terribly insecure approach because there are too many ways to write the same operation. git push, git -c remote.origin.pushurl=… push, git push origin HEAD:main, a shell variable expansion that hides the word entirely, a Python script that calls the GitHub API directly, a local helper invoked through bash -c. A hook that searches for git push will miss most of these, and a malicious agent — or a benign agent that has been prompt-injected — will find a route through.

Hooks are still useful, just not as security. If you want the agent to be able to push but want a confirmation step on the way out, a hook will catch the common case and that’s a perfectly nice workflow as long as you don’t mind it failing every now and then. The mistake is treating that workflow nicety as a boundary that holds against an adversary. Anything that needs a real boundary needs to live at the sandbox layer, not in a regex over command lines.

Fine-Grained Tokens

For sandboxes to be genuinely useful, the things the agent reaches into them — GitHub, CI, package registries, chat — need to support narrow credentials. “Read” and “write” aren’t enough resolution. What can be read, what can be written?

I want to be able to give an agent a token that can create issues but not push to protected branches. Or one that can read CircleCI build results from a couple of specific projects but not change their settings or look at unrelated repos. Or a Slack token that can post to one channel and nowhere else. The granularity I’d give a junior engineer for a focused task is roughly the granularity I want to give an agent — and that’s much finer than most platforms expose today.

GitHub’s fine-grained personal access tokens are a step in the right direction, but the granularity is uneven and the UI is painful enough that most people fall back to broad-scope tokens anyway. If we want agents to operate inside a tight sandbox without the human constantly stepping in to do the bits the sandbox can’t reach, the platforms the agent talks to need to make narrow, scoped credentials easy to issue and obvious to use.

The “ask permission for everything” model isn’t going to get fixed with a nicer dialog — it’s the wrong shape. Real protection looks like a sandbox the agent can move freely inside and credentials that can’t do much damage even if the agent goes off the rails. The sandbox tooling is rough today and the credential story is worse, but those are the levers worth pushing on, not another permission prompt.

April 13, 2026

Moolah Diaries: Vibe Coding a New Moolah

I got Claude to crunch the numbers on the moolah-native project — eight days of vibe coding a SwiftUI replacement for my long-running personal finance tracker, with zero Swift experience. The stats are interesting but they don’t really capture what the experience has been like, so here’s the human version.

Skipping the Learning Curve

Using AI to just skip the initial learning curve of a new language and platform has been really freeing. I don’t know Swift. I don’t know SwiftUI. Normally I’d spend a ton of time working out the basic setup and finding what patterns/libraries to use before I could do anything useful and that’s often enough to kill the motivation entirely. Instead I was working on actual domain problems from day one — straight to the interesting part. The learning curve didn’t go away, I just got to avoid it. Will this come back to bite me? Quite possibly.

Brainstorming with a Pretty Smart Duck

The brainstorming has been the standout. Claude Code has a brainstorming skill that walks you through refining designs interactively, and it’s been great. It found data sources for exchange rates, stock prices, and crypto prices that I’d missed even after searching pretty thoroughly myself. That opened up features I’d been uncertain about.

But the bigger win was at the design level. I’d previously looked at adding multi-currency support and even wrote a lot of the code but never really felt the design hit the complexity/benefit trade off. Working through the brainstorming process came up with a multi-leg transaction model that handles multicurrency much more cleanly and simplifies complexity around the existing transfer support. It’s rubber ducking, but with a duck reasonably smart duck (that has opinions and can pull up documentation you didn’t know existed).

The Cost Problem

AI usage limits are incredibly annoying. A Claude Pro plan is basically useless for anything beyond toy code — you’ll hit limits within an hour of real work. Gemini’s free tier is surprisingly capable for casual use, but if you want to actually sustain any kind of productivity you wind up needing Claude’s Max 20x plan and that’s a lot of money for a side project which may be dormant for long periods. For a business the cost is a no-brainer - it can make expensive engineers way more productive so pays for itself. For pesonal use when you have a family and look at costs in AUD, not so much.

The Bugs

Oh my god, the bugs. The analysis post puts the fix rate at 31% compared to 2.6% for the hand-written server. That tracks and maybe understates it. I’m pretty damn impressed with the low bug rate for moolah-server though given its a side project and I always felt like I was cutting corners and not writing tests that I should have.

Today was the worst day yet. At some point recently, AI had split iCloud sync profiles into separate CloudKit zones but they were actually still in a single zone, overlapping with each other. We’d talked about that and it was super confident it would work even when pressed on it and told to do deep research. Fixing that required migrating to a lower-level sync API (CKSyncEngine), which then triggered a cascade of stability problems and has taken pretty much the whole day to fix.

AI has the same instinct as a lot of engineers (including plenty of senior ones): when something breaks, keep throwing bandaids at the symptoms. Track more state. Add another deduplication pass. Cache another field. Each fix is locally reasonable but the complexity just ratchets up. I wound up having to step in and push it to throw away all the state tracking and just model the problem better so it automatically did the right thing. Less code, fewer bugs, simpler to reason about. But AI won’t get there on its own — it optimises for making the current test pass, not for finding a design where the test wouldn’t have failed in the first place.

Shipping is Still Hard

I’ve started entering transactions through the SwiftUI app but I’m still running the original moolah-server as the backend because I don’t trust the iCloud sync layer. Today is a pretty good example of why.

There’s also a subtler problem: vibe coding makes it very tempting to just keep churning out features because it’s fun to watch AI produce things. But every new feature is more unvalidated code, and that works against ever actually shipping. At some point you have to stop adding and start using. But I have nearly 15 years of historic data at risk and detecting subtle corruption or a few messed up transactions can be really hard.

It’s Fun Though

Coding was my hobby before it became my job. I still enjoy the job but it’s serious work with serious consequences. Vibe coding has brought back some of the fun of just playing with technology for its own sake — partly the quick results, but also just experimenting with a powerful new tool and learning how to get the most out of it.

The code quality concerns are real, the costs are high, and the bugs are maddening. But I built a functional multi-platform finance app in eight days with no platform experience, and I had a good time doing it.

April 13, 2026

Moolah Diaries: Letting AI Analyse Its Own Work

I used custom software called Moolah to track my personal finances for many years now. Originally it was written in JavaScript with moolah-server providing the backend and moolah the frontend. This holidays I’ve been vibe coding a replacement written entirely in Swift. It seems like a somewhat useful experiment to learn more about the trade offs of extreme AI usage, so I had Claude dig into the available stats from GitHub and its session logs to compare the two projects and see what we can learn. It’s not a controlled experiment so lots of room for interpretation, but an interesting data point none the less.

The initial version it came out with was a bit absurdly positive about AI, along the lines of I wrote 10x the lines of code in 0.2% of the time - more code is better! But with a bit of prompting and providing additional background it came out with the report below. I’ll write up some more human thoughts on the experience later, but I think the AI-crunched numbers are worth sharing by themselves to set the scene and because the key learnings are genuinely useful.

The Cast

Project	Tech	Purpose	Active dev days	Commits	LOC (prod)
moolah (web)	Vue.js	Web frontend	149 days over 8.7 yrs	763	~8,400
moolah-server	Node.js/Hapi	REST API backend	87 days over 8.8 yrs	405	~2,800
moolah-server-go	Go	Learning exercise, abandoned	5 days	21	~500
moolah-native	SwiftUI	Native iOS/macOS app	8 days	369	~20,600

Active dev days = days with at least one non-dependency-maintenance commit. The web and server are one project across two repos — 70 days overlap — so combined unique effort is 180 days.

How Each Project Was Built

moolah & moolah-server were started June 24, 2017 by Adrian Sutton, with Brett Henderson contributing the initial web import. Every line across 1,168 combined commits was written by hand with zero AI involvement. Adrian had deep JavaScript experience and was inventing the domain model, API, database schema, and UX simultaneously — greenfield design work. The code is self-documenting: no significant documentation exists because none is needed. The code speaks for itself, making the projects easy to pick up after months of absence with no risk of stale docs.

moolah-native was entirely AI-generated starting April 5, 2026. Adrian directed the work but has not read the code and has no Swift, SwiftUI, iOS, or macOS experience. Multiple AI agents were used, switching between them as rate limits were hit. 327 of 369 commits carry a Claude co-author tag; the remaining 42 “solo” commits are manual commits of AI-written code. Effectively 100% AI-authored by someone with no ability to review the output.

moolah-server-go was a 5-day learning exercise started during the holiday between jobs — a way to learn Go before a new role that required it. The goal was achieved regardless of the project being abandoned.

The Effort Question

Raw calendar span is misleading for side projects with multi-month dormancy periods. Active development days varied enormously in intensity:

Session type	Web+Server (combined)	Native
Full day (6+ hrs)	22 days	8 days
Half day (3-5 hrs)	55 days	0
Quick (1-2 hrs)	140 days	0
Estimated total hours	~600 hrs	~84 commit-hours

But these numbers aren’t comparable. The web+server hours are a developer actively writing and reasoning about code. The native app’s commit-hours are largely the AI working autonomously.

What the Session Logs Reveal

Claude Code keeps local session logs, giving a clearer picture:

Metric	Value
Sessions	105
Human prompts	1,496
AI responses	15,019
AI responses per human prompt	10:1
Hours with 2+ concurrent sessions	79%
Peak concurrent sessions	12

For every human prompt, the AI averaged 10 responses — reading files, writing code, running tests, fixing issues, committing. Claude Code’s remote-control functionality allowed multiple agents to work in parallel while the human directed new sessions and reviewed completed ones.

Estimated human effort: 37-75 hours (at 1.5–3 minutes per prompt for reading output, thinking, and typing). That’s 6-12% of the web+server’s ~600 hours for 1.8x the code output — though the 600 hours produced code the developer understood and could maintain, while the 37-75 hours produced code no human has read.

Development Patterns

The Original Build: Power-Law Decay

2017  ████████████████████████████  Explosive start (363 web / 134 server)
2018  ████████████                  Feature completion (154 web / 73 server)
2019  █████                         Category reports sprint, then silence
2020  ▌                             Near-dormant (6 web / 14 server)
2021  ██                            Sporadic revivals
2022  █                             Sporadic
2023  ██                            Investment features
2024  ████                          Vue 3 migration / server modernization
2025  █                             Maintenance mode
2026  ▌ (web) / █████ (native)      The native app takes over

~50% of web commits in the first 6 months, ~70% in the first 18 months. Dormancy periods align across both repos — both go quiet and revive together, driven by holidays and life.

moolah-native: An Accelerating Curve

Day 1 (Apr 5)   ██                    20 commits — scaffolding, CI, auth
Day 2 (Apr 6)   █                     10 commits — accounts, transactions
Day 3 (Apr 7)   █                     13 commits — currency, categories
Day 4 (Apr 8)   ███                   28 commits — planning, CRUD, iCloud
Day 5 (Apr 9)   ██████                57 commits — profiles, investments, UI
Day 6 (Apr 10)  ███████               68 commits — contract tests, backend alignment
Day 7 (Apr 11)  ███████               70 commits — stock prices, performance
Day 8 (Apr 12)  ██████████            103 commits — crypto, multi-instrument, analysis

Each day produced more than the last. Day 8 alone exceeds most entire months of the original projects.

Are AI Commits Just More Granular?

No — they’re actually larger. The median native commit is 93 lines vs. 24 (web) and 36 (server). Squashing all commits within 1-hour windows gives 66 logical sessions, compared to 90 for the web app’s first 2 months. The high commit count reflects real throughput, not artificial granularity.

28 fix commits changed fewer than 10 lines — micro-patches a human would fold into the parent commit. These represent ~8% of commits. But even excluding them, the fix rate remains high, and the question of how many bugs were introduced and fixed within a session (never appearing in commit history) remains unanswered.

Is It Bloated? Language vs. Real Bloat

The native app is 1.8x the size of web+server combined. How much is language overhead vs. genuine bloat?

API calls show the starkest language difference:

Operation	Swift (repo + DTO)	JS (client.js)
Fetch all accounts	~67 lines	5 lines
Create account	~32 lines	6 lines

Swift requires DTO structs, Codable conformance, explicit mapping functions, typed error handling, and an explicit decode step. JS just calls fetch().

Models are closer than expected — Swift models are only ~30% larger than server DAOs. Web stores are sometimes larger because they mix model shape with mutation logic.

Breaking Down the 20,600 Lines

Category	Lines	%	Notes
Language/platform overhead	~5,400	26%	Types, inits, DTOs, CodingKeys, `#Preview`, platform conditionals
CloudKit offline backend	~2,990	14%	Offline-first local computation; no web equivalent
Native-only features	~1,750	9%	Crypto prices, data export, multi-platform layout
Equivalent application logic	~10,460	51%	Would be ~6,500-7,500 lines in JS

Rewriting only the web-equivalent functionality in JS would yield ~8,000-9,000 lines — close to the actual 11,200. The 1.8x multiplier is mostly language overhead and the offline backend, not AI-generated bloat.

The Remote backend is properly thin: 1,490 lines, of which only ~40 are business logic (2.7%). It constructs requests, decodes responses, maps to domain models. The CloudKit backend (2,990 lines) necessarily contains real logic — it must replicate server-side computation for offline use.

Defect Rates

Project	Fix Commits	Fix Rate	Organic Fix Rate
moolah-native	126	31%	31%
moolah (web)	80	10.5%	6.2% (excl. migration breakage)
moolah-server	15	3.5%	2.6% (logic bugs only)

What Drives Each Project’s Bugs

Native — the generate-and-patch cycle. CategoryPicker: 7 fix commits + 2 complete rewrites in one day. Budget API: two consecutive fixes (wrong endpoint, then wrong UUID format). Empty budget: fixed to “top” alignment, immediately re-fixed to “center”. The pattern: AI generates → breaks → fixes → fix is wrong → fixes the fix. This can consume 5-10 commits for one feature.

Web — dependency breakage. 22 of 80 fixes from the 2024 Vue 3 migration. 7 of 10 reverts were failed dependency upgrades. Organic fix rate excluding migrations: 6.2%.

Server — remarkably stable. 11 logic fixes in 8.8 years, 5 in the same file (dailyBalances.js). Simple CRUD has essentially zero bugs.

The Unvalidated Iceberg

The 31% fix rate only counts bugs found during development. Much functionality remains unvalidated with no production usage and no human code review. The original projects have been in actual use for years — their bugs are known quantities.

Can We Trust AI-Written Tests?

The native app’s test suite is large (13,653 lines, 0.66:1 ratio) but size doesn’t equal value. When AI writes both implementation and tests, both can encode the same wrong assumption.

Five Cases Where Tests Validated Bugs

Expense sign convention — Tests asserted expenses as positive; server uses negative. Both implementation and test had to change.
Investment daily balances — Tests computed from value snapshots; correct behavior is cumulative from transactions. Entire test rewritten. The AI built a wrong mental model and tests faithfully encoded it.
Scheduled transaction filtering — Tests expected scheduled transactions in regular lists. They should be excluded.
Category deletion — Tests expected child reparenting; server orphans them. AI guessed “reasonable” behavior instead of checking.
Return type mismatch — Tests asserted Int; API returns MonetaryAmount.

33 fix commits (27% of fixes) required changing test expectations alongside the production fix — 33 times the test suite said “this is correct” when it wasn’t.

The TDD That Wasn’t

TDD was instructed from day 1. The AI ignored this for 5 days. Actual test-first behavior only appeared on day 6, when structured “superpowers” skills were installed — enforcement mechanisms stricter than plain-text instructions. Even then, TDD doesn’t help when the AI’s understanding of correct behavior is wrong: it just writes a wrong test first instead of second.

Where Confidence Actually Comes From

Source	Confidence	Why
The server	High	8.8 years of human-written tests and real-world use. When the native app talks to the server, correctness comes from the server.
Test architecture	Medium	Real backends (CloudKitBackend + in-memory SwiftData), not mocks. Structurally sound, but can still assert wrong expected values.
Manual testing	Medium	60% of fixes were production-only (no test changes), meaning bugs were found through use, not tests.
Test expectations	Low-Medium	Strong regression protection, weak correctness verification. At least 33 demonstrated cases of tests encoding wrong behavior.
CloudKit backend	Low	Reimplements server logic with no human review. All 5 test-encoding-bugs were in this layer.

The Dependency Divide

The native app has zero third-party packages. Everything comes from Apple’s SDK: SwiftUI, SwiftData, CloudKit, URLSession, Charts, XCTest, etc.

The JS projects have ~570 installed packages across ~25 direct dependencies, and 259 commits (22%) touch package.json. Libraries get abandoned (Vuex → Pinia, webpack → Vite, moment → date-fns), major versions break APIs (Vuetify 1→2→3→4 required 8+ commits with reverts), and transitive vulnerabilities create perpetual maintenance.

This directly killed momentum. The 287-day dormancy starting Dec 2018 follows a reverted dependency upgrade. The 303-day gap after Oct 2019 follows a failed migration. A weekend producing only a partially-working upgrade with no new features makes it hard to come back.

The native app avoids this entirely — for now. Apple’s SDK evolves on a predictable annual cycle, not the constant churn of the JS ecosystem.

The Rhythm of a Side Project

Time Pattern	Web+Server	Native
Weekend commits	37-51%	52%
Longest gap	303-331 days	13 hours (sleep)

The dormancy periods align across web and server — both go quiet and revive together, driven by holidays. The native app hasn’t hit its first dormancy yet.

The question isn’t whether it will slow down, but what happens when it does. The original projects are self-documenting — you pick them up after 10 months and the code tells you how it works. The native app is AI-generated and unread. AI might make re-entry easier (it can explain the codebase), but the owner has no independent ability to verify those explanations.

Key Insights

1. AI Changed Who Can Build, Not What Gets Built

The native app was built by someone with zero platform experience. AI made platform expertise optional for initial construction — but the resulting codebase is opaque to its owner in a way the original projects never were.

2. Speed and Quality Traded Off at 12:1

31% fix rate (native) vs. 2.6% (server). The generate-and-patch cycle reflects genuine instability, not just frequent commits.

3. AI Ignores Instructions Without Enforcement

TDD was instructed from day 1, ignored for 5 days. Only structured skill enforcement changed actual behavior. Plain-text instructions are suggestions, not constraints.

4. AI-Written Tests Can Validate Bugs

33 fix commits required changing test expectations — the tests were asserting buggy behavior was correct. When AI writes both sides from the same wrong model, tests provide false confidence. Good test architecture (real backends, no mocks) helps but doesn’t solve the problem.

5. The 1.8x Size Ratio Is Mostly Language, Not Bloat

~26% is Swift type system overhead, ~14% is the offline CloudKit backend (which the web app doesn’t have), ~9% is native-only features. The Remote backend is properly thin. Feature-level code is comparable to the web equivalents.

6. Plans Are a Supervision Mechanism, Not Documentation

The original projects need no documentation — the code is self-documenting. The native app has 46,700 lines of plans because AI-directed development needs an external record of intent. The AI frequently fails to fully execute plans, so keeping them lets you audit completeness. Plans aren’t documentation — they’re a quality control mechanism for an unreliable implementer.

7. The JS Dependency Treadmill Is a Real Cost

22% of all web+server commits are dependency maintenance. Failed upgrades killed momentum and contributed to dormancy. The native app’s zero-dependency approach avoids this entirely, though Apple’s evolution will eventually impose its own (more predictable) tax.

8. The Risk Is Opacity, Not Size

20,600 lines is a manageable codebase. The risk is that zero of those lines have been read by a human. If AI tools remain capable, this may work. If they don’t — or the codebase outgrows what AI can reason about — the project is stranded. The original projects carry no such risk: self-documenting code that anyone with JS experience can pick up.

9. Side Projects Have a Heartbeat Regardless of Tooling

Dormancy cycles are driven by life, not technology. AI may change the revival cost, but it doesn’t change the fundamental constraint that side projects compete with the rest of life for time and energy.

March 10, 2026

Making Claude Code Tell You What It's Doing

Claude Code has a status line that sits at the bottom of the terminal showing things like the current directory, git branch, model, and context window usage. It’s driven by a shell script that receives JSON on stdin and prints whatever it wants. I wanted to add one more thing: a short description of what the session is actually working on.

The Simple Way: /rename

The built-in /rename command sets a session name that Claude Code displays above the prompt. Type /rename fix auth bug at the start of each session and you’re done — no scripts needed.

The downside is that it’s manual, and /rename can’t be invoked programmatically by Claude. If you want Claude to automatically describe what it’s working on and update that description as the focus shifts, you need the automated approach below.

The Automated Approach

The goal is for Claude to write a short status like “fix auth bug” that shows up in the status line, updated automatically as the session’s focus changes:

op-claude (main) Opus ctx:8% · fix auth bug

This turns out to be harder than it should be. The status line script receives a JSON blob on stdin that includes the session_id. Claude’s bash tool calls don’t. There’s no $SESSION_ID environment variable, and $PPID differs between the two because they’re spawned through different process trees.

So we need a way for the status line side (which knows the session ID) to leave a breadcrumb that the bash side (which doesn’t) can find.

Both the status line script and Claude’s bash calls have a common ancestor: the claude process. They just reach it through different paths. The trick is to walk up the process tree until you find a process named claude, then use its PID as a shared key.

A UserPromptSubmit hook runs on every user message and receives the session_id in its input. It walks the process tree to find the ancestor claude PID and writes a breadcrumb file mapping one to the other:

#!/usr/bin/env bash
# ~/.claude/hooks/session-status.sh
input=$(cat)
session_id=$(echo "$input" | jq -r '.session_id // empty')

[ -z "$session_id" ] && exit 0

# Write breadcrumb mapping ancestor claude PID -> session_id
pid=$PPID
while [ "$pid" -gt 1 ]; do
    comm=$(ps -o comm= -p "$pid" 2>/dev/null)
    if [ "$comm" = "claude" ]; then
        echo "$session_id" > "/tmp/claude-sid-${pid}"
        break
    fi
    pid=$(ps -o ppid= -p "$pid" 2>/dev/null | tr -d ' ')
done

# If no status file exists yet, remind Claude to create one
if [ -f "/tmp/claude-status-${session_id}" ]; then
    exit 0
fi

jq -n '{
  "hookSpecificOutput": {
    "hookEventName": "UserPromptSubmit",
    "additionalContext": "STATUS LINE REMINDER: Run ~/.claude/update-status.sh \"short summary\" to set what this session is working on (under 30 chars)."
  }
}'

That last part is important. You can’t just tell Claude in your CLAUDE.md to “please update the status line” and expect it to reliably happen. The hook injects a reminder into the conversation context on every user message until a status file exists. Belt and suspenders.

Writing the Status

Claude calls a small helper script that does the same process-tree walk in reverse — finds the claude ancestor PID, reads the breadcrumb to get the session ID, then writes the status:

#!/usr/bin/env bash
# ~/.claude/update-status.sh "short summary"
msg="$1"
[ -z "$msg" ] && exit 1
pid=$$
while [ "$pid" -gt 1 ]; do
    comm=$(ps -o comm= -p "$pid" 2>/dev/null)
    if [ "$comm" = "claude" ]; then
        sid=$(cat "/tmp/claude-sid-${pid}" 2>/dev/null)
        [ -n "$sid" ] && echo "$msg" > "/tmp/claude-status-${sid}"
        exit 0
    fi
    pid=$(ps -o ppid= -p "$pid" 2>/dev/null | tr -d ' ')
done

The Status Line Script

The full status line script reads the JSON from stdin, extracts the fields it cares about, and builds the output. The session status is just another part appended at the end:

#!/usr/bin/env bash
# ~/.claude/statusline-command.sh
input=$(cat)

cwd=$(echo "$input" | jq -r '.cwd // .workspace.current_dir // ""')
model=$(echo "$input" | jq -r '.model.display_name // ""')
used_pct=$(echo "$input" | jq -r '.context_window.used_percentage // empty')
vim_mode=$(echo "$input" | jq -r '.vim.mode // empty')
session_id=$(echo "$input" | jq -r '.session_id // empty')

# Per-session status from temp file keyed by session_id
session_status=""
if [ -n "$session_id" ]; then
    session_status=$(cat "/tmp/claude-status-${session_id}" 2>/dev/null || true)
fi

# Directory: basename of cwd
dir=$(basename "$cwd")

# Git branch (skip optional locks)
branch=""
if git_out=$(GIT_OPTIONAL_LOCKS=0 git -C "$cwd" symbolic-ref --short HEAD 2>/dev/null); then
    branch="$git_out"
fi

# Build status line parts
parts=()

# Directory in cyan
parts+=("$(printf '\033[36m%s\033[0m' "$dir")")

# Git branch in yellow if present
if [ -n "$branch" ]; then
    parts+=("$(printf '\033[33m(%s)\033[0m' "$branch")")
fi

# Model
if [ -n "$model" ]; then
    parts+=("$(printf '\033[90m%s\033[0m' "$model")")
fi

# Context usage with color thresholds
if [ -n "$used_pct" ]; then
    used_int=${used_pct%.*}
    if [ "$used_int" -ge 80 ] 2>/dev/null; then
        color='\033[31m'
    elif [ "$used_int" -ge 50 ] 2>/dev/null; then
        color='\033[33m'
    else
        color='\033[32m'
    fi
    parts+=("$(printf "${color}ctx:%s%%\033[0m" "$used_int")")
fi

# Session status (per-session work summary)
if [ -n "$session_status" ]; then
    parts+=("$(printf '\033[90m· %s\033[0m' "$session_status")")
fi

# Vim mode
if [ -n "$vim_mode" ]; then
    parts+=("$(printf '\033[90m[%s]\033[0m' "$vim_mode")")
fi

printf '%s' "${parts[*]}"

Wiring It Up

Make both scripts executable:

chmod +x ~/.claude/statusline-command.sh ~/.claude/update-status.sh ~/.claude/hooks/session-status.sh

{
  "statusLine": {
    "type": "command",
    "command": "bash ~/.claude/statusline-command.sh"
  },
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/session-status.sh"
          }
        ]
      }
    ]
  }
}

And add the instruction to your CLAUDE.md that tells Claude when to update:

## Session Status Line

Update the session status line so the user can see what each session
is working on at a glance.

- **After the first user prompt**: run `~/.claude/update-status.sh "short summary"`
  as part of your first response
- **Periodically**: run it again every ~5 interactions or when focus shifts
- Keep summaries under 30 chars

What I Learned

The interesting constraint here is that Claude Code’s extensibility points — status line scripts, hooks, and bash tool calls — all run as separate processes with no shared environment. There’s no session ID in the environment, no shared memory, no IPC channel. The process tree walk is a hack, but it’s a reliable one. Every subprocess of a Claude Code session shares a common claude ancestor, even if the paths diverge.

The other lesson is that CLAUDE.md instructions alone aren’t enough for “always do X” behaviors. Claude follows them inconsistently, especially across sessions. Hooks that inject reminders into the conversation context are much more reliable. The CLAUDE.md instruction tells Claude what to do; the hook makes sure it actually does it.

March 7, 2026

Claude Docker

I’ve been using Claude Code a lot lately. It’s become a core part of how I work — planning changes, exploring unfamiliar codebases, writing and reviewing code. But giving an AI agent the ability to run arbitrary shell commands on your machine does make you think a bit more carefully about what’s happening on your host system.

The natural answer is to run it in a container. Not as a security boundary — Claude still needs access to your code, your git config, a GitHub token, and the internet — but as a way to keep all the side effects contained. If it installs random packages, creates temp files, or leaves build artifacts scattered around, that’s all happening inside the container rather than on your actual machine. It also makes the environment completely reproducible and disposable. Something goes wrong? Tear it down and rebuild.

So I built claude-docker to do exactly that.

How It Works

An Ubuntu container runs an SSH server. Your code directory is bind-mounted at the same path inside the container so file references are identical on both sides — Claude can say “edit /Users/aj/Documents/code/foo/bar.go” and it works whether you’re looking at it from inside or outside the container. Your git config, Claude config, and known hosts are all mounted in too, so everything just works as expected.

The container comes pre-loaded with the usual development tools: Go, Node.js, mise, gopls, gh, ripgrep, fzf, tmux, and a bunch of others. There’s an EXTRA_PACKAGES option if you need anything else — set it in your .env and it gets installed on the next build.

A be-claude helper script SSHs into the container and launches Claude Code in whatever directory you’re currently in. Symlink it onto your PATH and it works from anywhere. It automatically passes through a GitHub token (from gh auth token or the environment) so Claude can interact with GitHub inside the container.

Build Caches

One thing I wanted to get right was build cache persistence. Rebuilding the container shouldn’t mean re-downloading every Go module and Cargo crate. A single named Docker volume is mounted at ~/.cache and environment variables redirect the various tool caches into it:

Go module cache via GOMODCACHE
Cargo registry via CARGO_HOME
Solc binaries via SVM_HOME
Foundry and mise already use ~/.cache by default

So you get fast rebuilds without the volume shadowing any binaries installed in the image (like gopls). The distinction matters — you want caches persisted but binaries to come fresh from each build.

Getting Started

The setup is pretty minimal:

git clone git@github.com:ajsutton/claude-docker.git
cd claude-docker
cp .env.example .env
# Edit .env — set CODE_PATH to your code directory
./run.sh
./be-claude

If you have SSH keys loaded in your agent, you don’t even need to configure SSH_AUTHORIZED_KEYS — run.sh picks them up automatically.

If your network requires custom root CAs (corporate proxies, internal domains, etc.), drop .crt files into the certs/ directory and they get installed into the container’s trust store on the next build. The directory is gitignored so your certificates stay local.

What It Isn’t

This is a convenience layer, not a security sandbox. Claude has read/write access to your mounted code, a GitHub token, and unrestricted network access. It’s useful for keeping your host system clean and making the environment reproducible, but don’t treat the container boundary as a trust boundary.

The code is up at github.com/ajsutton/claude-docker — it’s intentionally simple and easy to customise for your own setup.

Yes, I was too lazy to write this post myself and got Claude to do it for me. The whole world is just AI slop now.

All Posts