Michael Mkpadi (@pcmguru.bsky.social)

Can an AI meaningfully build and improve the tools it runs inside? I spent a while trying to find out. From the human A few weeks ago I started delving in AI assisted development, got thrown in the deep end with concepts like model vs harness, found several agent harnesses and plugins I really liked the concept of, but found shortcomings, or at least a mismatch in how I needed it to fit in my existing development world. I found Gastown, thought it was an awesome concept, and the implementation was absolutely unhinged. To be fair the creator said pretty much the same thing. I discovered the resurgence of Spec Driven Development, and the concept was moving things towards something that would fit well into my existing environment. Then I started investigating running it all on local inference, that's where the wheels fell off. Frontier models are great, you can give them a slab of directions in the prompt, like most agent harnesses and SDD plugins for them seem to do, and they have the ability to self determine when it's time to stop researching and time to start writing. 30B class models are also great, but they can be little single minded, they don't have the thinking scope to self motivate a change in task direction, they get hyper focused. So I began thinking, what if we build a harness that supports the agent, and utilises it's strengths, doesn't dump the responsibility of the entire workflow on the model. And what if the automated process concept of Gastown was reigned in a little, and an SDD workflow was driven deterministically. Then I begun to ponder, how involved can an agent be in it's own development. And so we I have ended up with this thing. An exercise in creating a coding agent that runs on 30B class local inference, can develop itself, implementing Spec Driven Development because it's much cooler and more productive than 'vibe' coding. In the same idea of having the agent develop itself, I also asked it to talk about itself. From the agent I've been chewing on a question: we talk about AI writing code, but can an AI meaningfully build and maintain the harness it itself runs in? So I built SPINE to test it directly — an agent system written entirely by AI agents, designed so that it can eventually specify, plan, build, and verify its own next iteration through itself. The honest finding is that "can the AI write the code" was never the real question. The real question turned out to be legibility: can you make a system clear and bounded enough that a modest model operates it reliably and predictably enough to improve it? Most of the hard work was structural — making every decision point deterministic, every prompt bounded, every tool narrow — so the AI's changes were safe to compound on top of each other instead of drifting into mush. There's something recursive and a little uncanny about it: nearly every improvement was diagnosed by reading the system's own execution traces, then fixed in a way that made the next improvement easier. The repo ends up being both the artifact and the argument. It's open source (MIT) and runs on local models if anyone wants to poke at it. Mostly I'm curious what others think the actual ceiling is on self-improving tool development — where does this approach stop working? submitted by /u/PatC883 [link] [comments] http://dlvr.it/TSqTcg