Skip to content

HUMANS

The code in this repo is primarily generated by AI agents. This doc is for the humans

Step 1 -- Blueprint

The first step in the process was to have AIs create proposed architectural documents. I asked GPT-5 Pro, Claude Sonnet 4.5, and Gemini Pro 2.5—all in deep research mode—to propose an architectural document. The final solution I selected was the following: https://g.co/gemini/share/ef5097947855

The prompt was originally dictated using Wispr in a somewhat rambling style, with no effort made to clean it up afterward. This is my typical prompting style. After dictating the prompt once, I simply copied and pasted it to the other two platforms.

Step 2 -- Devops for God

The second step involved creating a repo with features convenient for AI agents to operate in, whether working locally or in the cloud (as with Codex or Devin). The idea was to create a test script that properly invoked firebase emulator:exec to spin up a transient database that would be torn down after tests ran, ensuring a clean state each time. I also introduced a test harness that could delete the database via a special HTTP request. Finally, I set up minimal GitHub workflows and documentation.

Step 3 -- Let 'er Rip

I created an AGENTS.md file and gave multiple platforms a try. I gave Claude, Codex, and Jules each an opportunity to implement Phase I of the roadmap. Codex actually had two chances since they have a button that makes it easy to launch multiple attempts. All of them created PRs, which you can review. Ultimately, I spent the most time with Claude's PR.

All the agents initially used extensive mocking in their unit tests, which seemed rather pointless. I had Claude switch to actually using the Firestore emulator with live objects. Then I realized my original prompt was ambiguous, and Claude's approach to supporting async was convoluted—it didn't realize it should be using the async native Firestore client. I had it resolve that issue, which should be visible in the various commits in that PR.

Next, I asked it to do a Phase I analysis comparing what was actually created versus the original Phase I objectives. I then had it update the architecture document to clearly specify how to implement async and sync code, and to prefer live emulation over mocking whenever possible. I also had it flesh out more details about what the architecture looks like and the current project status. Finally, I asked it to create a Jupyter notebook demonstrating how Phase I functionality works.

After that, I spent about 2 hours reading through the code, looking for UX and other issues, and having Claude do micro-tweaks—10-20 line chunks of grunt work that I didn't want to do myself, but fairly traditional agent coding.

In all, Phase I took about 20 hours of wall time. During that time, Claude ran for approximately an hour across 11 sessions. The early sessions were brief (around 10 minutes each), but I couldn't give it tasks that took hours. The good testing harness likely helped—it could run tests against the emulator without making silly mistakes. My time broke down roughly as: 15 minutes dictating prompts using Wispr Flow (I don't type them, I speak them), 30-60 minutes watching Claude run, and a couple hours reading code to understand what went wrong and determine the best solution before directing Claude. My estimate is that this amount of work would have taken two to three seasoned engineers a couple of weeks, with considerable fatigue involved—especially when you include documentation, demos, docstrings, and testing.

Phase 2

Phase II ended up being broken into two parts. I offered this to both Codex and Claude again. Codex produced two solutions that looked reasonable, but I couldn't get them published as a PR—likely due to how I launched them. Claude's solution looked fine, so I went with that. Claude chose to implement roughly 80% of Phase II in the first section, and it looks solid. The whole process took about 30 minutes, but it completed everything. I then asked it to create a demo notebook.

For the second part of this phase, Claude implemented some querying logic. I asked it to create the logic, write tests, create a demo notebook, and update STATUS.md to reflect progress. That worked smoothly and took about 20 minutes. I didn't monitor Claude closely this time since there wasn't much to scrutinize—it just worked. At this point, things are moving along well. Given the subtlety of some of these implementations, I estimate this would take an engineering team several days. We're now roughly 50% complete. The project is mildly useful now and on its way to being extremely useful.

Phase 3

Phase 3 also went pretty quickly.

After Phase 3, I started working in topics mode. I prompt Claude to implement a feature and then create a topic demo notebook describing it.

Around this point, I basically just joined the Borg, and stopped keeping Captain's Log. Instead I found a script that can render Claude code logs, and additional I asked Claude to maintain a Status.md and also implementation reports. At that point, my job was done, and I no longer exist.