Everything in this post stems from the assumption that you already know what you're doing, which is probably true for things you've built before. But I hope we can agree that you can't spec out something you have no clue how to build, let alone write the tests before you've even explored the boundaries of the problem space. That's completely unreasonable.
My second point is that this approach is fundamentally wrong for AI-first development. If the cost of writing code is approaching zero, there's no point investing resources to perfect a system in one shot. What matters more is how fast you can explore the edges. You can now spin up five agents to implement five different versions of the thing you're building and simply pick the best one.
In our shop, we have hundreds of agents working on various problems at any given time. Most of the code gets discarded. What we accept to merge are the good parts.
Nothing of what you write here matches my experience with AI.
Specification is worth writing (and spending a lot more time on than implementation) because it's the part that you can still control, fully read, understand etc. Once it gets into the code, reviewing it will be a lot harder, and if you insist on reviewing everything it'll slow things down to your speed.
> If the cost of writing code is approaching zero, there's no point investing resources to perfect a system in one shot.
THe AI won't get the perfect system in one shot, far from it! And especially not from sloppy initial requirements that leave a lot of edge (or not-so-edge) cases unadressed. But if you have a good requirement to start with, you have a chance to correct the AI, keep it on track; you have something to go back to and ask other AI, "is this implementation conforming to the spec or did it miss things?"
> five different versions of the thing you're building and simply pick the best one.
Problem is, what if the best one is still not good enough? Then what? You do 50? They might all be bad. You need a way to iterate to convergence
This. Waterfall never worked for a reason. Humans and agents both need to develop a first draft, then re-evaluate with the lessons learned and the structure that has evolved. It’s very very time consuming to plan a complex, working system up front. NASA has done it, for the moon landing. But we don’t have those resources, so we plan, build, evaluate, and repeat.
That "first draft" still has to start with a spec. Your only real choice is whether the spec is an actual part of project documentation with a human in the loop, or it's improvised on the spot within the AI's hidden thinking tokens. One of these choices is preferable to the other.
There’s a real tension here.
If you are vibe-coding, this approach is definitely going to kill you buzz and lose all the rapid iteration benefits.
But if you are working in an existing large system, vibe coding is hard to bring into the core. So I think something more formal like OP is needed to reap major benefits from AI.
This is just AI-written slop, but even if you're vibe coding and want to go for rapid iteration, you still benefit by having the AI write out a broad plan of what it's going to do and looking it over before telling it to implement it. One-shot vibe coding is totally worthless, but the more you're aware of what the AI is thinking about and ready to revise its plans, the better it can potentially do.
If the price of code is zero then changing the spec also costs zero in terms of code and. This is what always was the problem with specs before. You'd write one, run it through the prover, write the code, then have to throw out the whole thing because there was a business case you didn't account for.
Now the bottom 98% can be given to a robot with a clear success signal other than 'it looks about right'.
code is orthogonal to spec. you can iterate on the code and iterate on the spec. the spec is not meant to be constant, it's a form of ECC for the artifacts of the coding pipeline.
Thats why I have AI do a write up about the system I want to build, I then review it all. If it looks good I use it as my prompt.
A lot of interesting replies below this comment that I won't be able to respond to individually.
I'll just leave this here:
If you don't mind the question with regard to your second point, couldn't what you've done in your shop be also used here? There's no reason why 'try to develop it five different ways and pick the best parts out of each' is incompatible with the 'VSDD' concept; seems like it could be included?
> But I hope we can agree that you can't spec out something you have no clue how to build
Eh, of course you can. You can specify anything as long as you know what you want it to do. This is like systems engineering 101 and people do it successfully all the time.
> you can't spec out something you have no clue how to build
Ideally—and at least somewhat in practice—a specification language is as much a tool for design as it is for correctness. Writing the specification lets you explore the design space of your problem quickly with feedback from the specification language itself, even before you get to implementing anything. A high-level spec lets you pin down which properties of the system actually matter, automatically finds an inconsistencies and forces you to resolve them explicitly. (This is especially important for using AI because an AI model will silently resolve inconsistencies in ways that don't always make sense but are also easy to miss!)
Then, when you do start implementing the system and inevitably find issues you missed, the specification language gives you a clear place to update your design to match your understanding. You get a concrete artifact that captures your understanding of the problem and the solution, and you can use that to keep the overall complexity of the system from getting beyond practical human comprehension.
A key insight is that formal specification absolutely does not have to be a totally up-front tool. If anything, it's a tool that makes iterating on the design of the system easier.
Traditionally, formal specification have been hard to use as design tools partly because of incidental complexity in the spec systems themselves, but mostly because of the overhead needed to not only implement the spec but also maintain a connection between the spec and the implementation. The tools that have been practical outside of specific niches are the ones that solve this connection problem. Type systems are a lightweight sort of formal verification, and the reason they took off more than other approaches is that typechecking automatically maintains the connection between the types and the rest of the code.
LLMs help smooth out the learning curve for using specification languages, and make it much easier to generate and check that implementations match the spec. There are still a lot of rough edges to work out but, to me, this absolutely seems to be the most promising direction for AI-supported system design and development in the future.
"Most of the code gets discarded." If you don't mind sharing, what's your signal-to-token ratio?
How do you propose we measure signal? Lines of code is renowned for being a very bad measure of anything, and I really can't come up with anything better.
The OP said that they kept what they liked and discarded the rest. I think that's a reasonable definition for signal; so, the signal-to-token ratio would be a simple ratio of (tokens committed)/(tokens purchased). You could argue that any tokens spent exploring options or refining things could be signal and I would agree, but that's harder to measure after the fact. We could give them a flat 10x multiplier to capture this part if you want.
I'm going to call it out as bullshit, you can't dig out "what you like" from "hundreds agents running all the time".