I don't trust anyone who sees the output of current generation of LLMs and thinks "I want that to be agentic!". It can be immensely useful, but it's only useful when it manages to make your own brain work more, not less.
I’m finding agentic coding to be a fascinating tool. The output is a mess but it takes so little input to make something quite functional. I had an app that I wrote with a python GUI framework I didn’t quite like. ChatGPT rewrote it to use GTK and it is so much faster now. Later Claude added a browser mode where the app can be run via GTK or a browser tab. I have never written a GTK app in my life past some hello world text box.
The output is very problematic. It breaks itself all the time, makes the same mistakes multiple times, I have to retread my steps. I’m going to have it write tests so it can better tell what it’s breaking.
But being able to say “take this GTK app and add a web server and browser based mode” and it just kinda does it with minimal manual debugging is something remarkable. I don’t fully understand it, it is a new capability. I do robotics and I wish we had this for PCB design and mechanical CAD, but those will take much longer to solve. Still, I am eager to point Claude at my hand written python robotics stack from my last major project [1] and have it clean up and document what was a years long chaotic prototyping process with results I was reasonably happy with.
The current systems have flaws but if you look at where LLMs were five years ago and you see the potential value in fixing the flaws with agentic coding, it is easy to imagine that those flaws will be addressed. There will be higher level flaws and those will eventually be addressed, etc. Maybe not, but I’m quite curious to see where this goes, and what it means for engineering as a human being at these times.
[1] https://github.com/sequoia-hope/acorn-precision-farming-rove...
It is fascinating and it absolutely excels at writing barely-working, problematic code, that yet somehow appears to run. This helps me a lot, as having a shitty code to fix makes my mind much more engaged than when I'm writing stuff from scratch, but making the model do more stuff autonomously rather than having me consciously review it at each step is only making it less useful, not more.
I've noticed that the quality of the output can be improved dramatically, but it takes a lot of... work isn't the right word, prior knowledge, persistence and systems, maybe.
Implementation plans, intermediate bot/hunan review to control for complexity, convention adherence, actual task completion, then provide guidance, manage the context and a ton of other things to cage and harness the agent.
Then, what it produces, it almost passes the sniff test. Add further bot and human code review, and we've got something that passes muster.
The siren song of "just do it/fix it" is hard to avoid sometimes, especially as deadlines loom, but that way lies pain. Not a problem for a quick prototype or something throwaway (and OP is right, that that works at all is nothing short of marvelous), but to create output to be used in long term maintainable software a lot has to happen, and even that it's sometimes a crap shoot.
But why not do it by hand then? To me it still accelerates and opens up the possibility space tremendously.
Overall I'm bullish on agents improving past the current necessary operator-driven handholding sooner than later. Right now we have the largest collection of developer agent RL data ever, all the labs sucking up that juicy dev data. I think that will improve today's tooling tremendously.
Yes, it requires prior understanding of what you're attempting to do. That's more or less what I meant by "making your own brain work more" - if you treat it as an input for your brain to operate on and exercise your knowledge, it can boost your productivity. If you treat it as a tool that lets you think less, you end up with nothing but slop. Sometimes even slop will be useful, but the contexts where this is true are limited.
I have no doubt that agents will become meaningfully useful for some things at some point. This just hasn't really happened yet, aside of the really simple stuff perhaps.
As I see it, all of civilization is built on top of this "laziness" principle of "I'm tired of having to deal with X, how can I sort X so that I don't need to think about it anymore, and can instead focus on Y". I'm general, people want their brain to work on other stuff, not what's currently in front of them.
...which is precisely why they end up with slop rather than increased productivity, as it's not a tool that's up for this task.
On a related note, there are hundreds of millions of knowledge workers around the world who don't want to write code but do want to automate repetitive calculations across various domains. Over the last decades, they have created many billions of spreadsheets that are now the blood vessels of the global economy. Most of these spreadsheets are bug-ridden, particularly around edge conditions. Nevertheless, the economic "blood" keeps flowing, routing around errors and inefficiencies, and I've never heard anyone claim that we'd have better productivity if random Joe couldn't create a spreadsheet, and had to instead wait for the budget for a programmer to write code for them to do that (even if we had programmers writing bug-free code).
Which is exactly what agentic tools do--I focus on making decisions, not carrying out the gruntwork actions.
It's exactly what agentic tools make harder to do. LLM-generated code usually looks great at the first glance - your opinion on how bad it is is a function of effort spent reviewing, analyzing and questioning it.
The shitty code it comes up with helps me a lot, because fixing broken stuff and unraveling the first principles of why it's broken is how I learn best. It helps me think and stay focused. When learning new areas, the goal is to grasp the subject matter enough to find out what's wrong with the generated code - and it's still a pretty safe bet that it will be wrong, one way or another. Whenever I attempt to outsource the actual thinking (because of feeling lazy or even just to check the abilities of the model), the results are pretty bad and absolutely nowhere near the quality level of anything I'd want to sign my name under.
Of course, some people don't mind and end up wasting other people's time with their generated patches. It's not that hard to find them around. I have higher standards than replying "dunno, that's what AI wrote" when a reviewer asks why is something done this particular way. Agentic tools bring down the walls which could let you stop for a moment and notice the sloppiness of that output even further. They just let the model do more of the things it's not very good at, and encourage it to flood the code with yet another workaround for an issue that would disappear completely had you spent two minutes pondering about the root cause (which you won't do, because you don't have a mental model of what the code does, because you let the agent do it for you).
I’m afraid this is a case of “you’re doing it wrong.”
I use Claude Code with a dozen different hand tuned subagent specs and a comprehensive CLAUDE.md specifying how to use them. I review every line of code before committing (turning off the auto commit was the very first instruction). It is now the case that it is able to make a full PR that needs no major changes. Often just one or two follow up tweak requests.
With subagents it can sometimes be running for an hour or more before it is done, but I don’t have to babysit it anymore.
My experiences with various models make me very suspicious of what kind of code you end up with, but if it works for your particular needs then good for you. I couldn't make it work this way for mine.