I think this article is pretty spot on — it articulates something I’ve come to appreciate about LLM-assisted coding over the past few months.
I started out very sceptical. When Claude Code landed, I got completely seduced — borderline addicted, slot machine-style — by what initially felt like a superpower. Then I actually read the code. It was shockingly bad. I swung back hard to my earlier scepticism, probably even more entrenched than before.
Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.
It’s still absolute rubbish if you just let it run wild, which is why I think “vibe coding” is basically just “vibe debt” — because it just doesn’t do what most (possibly uninformed) people think it does.
But if you treat it as a collaborator — more like an idiot savant with a massive brain but no instinct or nous — or better yet, as a mech suit [0] that needs firm control — then something interesting happens.
I’m now at a point where working with Claude Code is not just productive, it actually produces pretty good code, with the right guidance. I’ve got tests, lots of them. I’ve also developed a way of getting Claude to document intent as we go, which helps me, any future human reader, and, crucially, the model itself when revisiting old code.
What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.
Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”? There’s no counterfactual, so who knows? But change was inevitable.
What’s clear is this tech is here now, and complaining about it feels a bit like mourning the loss of punch cards when terminals showed up.
[0]: https://matthewsinclair.com/blog/0178-why-llm-powered-progra...
One of the things I think is going on here is a sort of stone soup effect. [1]
Core to Ptacek's point is that everything has changed in the last 6 months. As you and I presume he agree, the use of off-the-shelf LLMs in code was kinda garbage. And I expect the skepticism he's knocking here ("stochastic parrots") was in fact accurate then.
But it did get a lot of people (and money) to rush in and start trying to make something useful. Like the stone soup story, a lot of other technology has been added to the pot, and now we're moving in the direction of something solid, a proper meal. But given the excitement and investment, it'll be at least a few years before things stabilize. Only at that point can we be sure about how much the stone really added to the soup.
Another counterfactual that we'll never know is what kinds of tooling we would have gotten if people had dumped a few billion dollars into code tool improvement without LLMs, but with, say, a lot of more conventional ML tooling. Would the tools we get be much better? Much worse? About the same but different in strengths and weaknesses? Impossible to say.
So I'm still skeptical of the hype. After all, the hype is basically the same as 6 months ago, even though now the boosters can admit the products of 6 months ago sucked. But I can believe we're in the middle of a revolution of developer tooling. Even so, I'm content to wait. We don't know the long term effects on a code base. We don't know what these tools will look like in 6 months. I'm happy to check in again then, where I fully expect to be again told: "If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing." At least until then, I'm renewing my membership in the Boring Technology Club: https://boringtechnology.club/
> Core to Ptacek's point is that everything has changed in the last 6 months.
This was actually the only point in the essay with which I disagree, and it weakens the overall argument. Even 2 years ago, before agents or reasoning models, these LLMs were extremely powerful. The catch was, you needed to figure out what worked for you.
I wrote this comment elsewhere: https://news.ycombinator.com/item?id=44164846 -- Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions, but if respectable people with no stake in selling AI like @tptacek or @kentonv in the other AI thread are saying similar things, you should probably take a closer look.
>if respectable people with no stake in selling AI like @tptacek or @kentonv in the other AI thread are saying similar things, you should probably take a closer look.
Maybe? Social proof doesn't mean much to me during a hype cycle. You could say the same thing about tulip bulbs or any other famous bubble. Lots of smart people with no stake get sucked in. People are extremely good at fooling themselves. There are a lot of extremely smart people following all of the world's major religions, for example, and they can't all be right. And whatever else is going on here, there are a lot of very talented people whose fortunes and futures depend on convincing everybody that something extraordinary is happening here.
I'm glad you have found something that works for you. But I talk with a lot of people who are totally convinced they've found something that makes a huge difference, from essential oils to functional programming. Maybe it does for them. But personally, what works for me is waiting out the hype cycle until we get to the plateau of productivity. Those months that you spent figuring out what worked are months I'd rather spend on using what I've already found to work.
The problem with this argument is that if I'm right, the hype cycle will continue for a long time before it settles (because this is a particularly big problem to have made a dent in), and for that entire span of time skepticism will have been the wrong position.
I think it depends a lot on what you think "wrong position" means. I think skepticism only really goes wrong when it refuses to see the truth in what it's questioning long past the point where it's reasonable. I don't think we're there yet. For example, questions like "What is the long term effect on a code base" take us seeing the long term. Or there are legitimate questions about the ROI of learning and re-learn rapidly changing tools. What's worth it to you may not be in other situations.
I also think hype cycles and actual progress can have a variety of relationships. After Bubble 1.0 burst, there were years of exciting progress without a lot of hype. Maybe we'll get something similar here, as reasonable observers are already seeing the hype cycle falter. E.g.: https://www.economist.com/business/2025/05/21/welcome-to-the...
And of course, it all hinges on you being right. Which I get you are convinced of, but if you want to be thorough, you have to look at the other side of it.
Well, two things. First, I spent a long time being wrong about this; I definitely looked at the other side. Second, the thing I'm convinced of is kind of objective? Like: these things build working code that clears quality thresholds.
But none of that really matters; I'm not so much engaging on the question of whether you are sold on LLM coding (come over next weekend though for the grilling thing we're doing and make your case then!). The only thing I'm engaging on here is the distinction between the hype cycle, which is bad and will get worse over the coming years, and the utility of the tools.
Thanks! If I can make it I will. (The pinball museum project is sucking up a lot of my time as we get toward launch. You should come by!)
I think that is one interesting question that I'll want to answer before adoption on my projects, but it definitely isn't the only one.
And maybe the hype cycle will get worse and maybe it won't. Like The Economist, I'm starting to see a turn. The amount of money going into LLMs generally is unsustainable, and I think OpenAI's recent raise is a good example: round 11, $40 billion dollar goal, which they're taking in tranches. Already the largest funding round in history, and it's not the last one they'll need before they're in the black. I could easily see a trough of disillusionment coming in the next 18 months. I agree programming tools could well have a lot of innovation over the next few years, but if that happens against a backdrop of "AI" disillusionment, it'll be a lot easier to see what they're actually delivering.
So? The better these tools get, the easier they will be to get value out of. It seems not unwise to let them stabilize before investing the effort and getting the value out, especially if you’re working in one of the areas/languages where they’re still not as useful.
Learning how to use a tool once is easy, relearning how to use a tool every six months because of the rapid pace of change is a pain.
This isn't responsive to what I wrote. Letting the tools stabilize is one thing, makes perfect sense. "Waiting until the hype cycle dies" is another.
I suspect the hype cycle and the stabilization curves are relatively in-sync. While the tools are constantly changing, there's always a fresh source of hype, and a fresh variant of "oh you're just not using the right/newest/best model/agent/etc." from those on the hype train.
This is the thing. I do not agree with that, at all. We can just disagree, and that's fine, but let's be clear about what we're disagreeing about, because the whole goddam point of this piece is that nobody in this "debate" is saying the same thing. I think the hype is going to scale out practically indefinitely, because this stuff actually works spookily well. The hype will remain irrational longer than you can remain solvent.
Well, generally, that’s just not how hype works.
A thing being great doesn’t mean it’s going to generate outsized levels of hype forever. Nobody gets hyped about “The Internet” anymore, because novel use cases aren’t being discovered at a rapid clip, and it has well and throughly integrated into the general milieu of society. Same with GPS, vaccines, docker containers, Rust, etc., but I mentioned the Internet first since it’s probably on a similar level of societal shift as is AI in the maximalist version of AI hype.
Once a thing becomes widespread and standardized, it becomes just another part of the world we live in, regardless of how incredible it is. It’s only exciting to be a hype man when you’ve got the weight of broad non-adoption to rail against.
Which brings me to the point I was originally trying to make, with a more well-defined set of terms: who cares if someone waits until the tooling is more widely adopted, easy to use, and somewhat standardized prior to jumping on the bandwagon? Not everyone needs to undergo the pain of being an early adopter, and if the tools become as good as everyone says they will, they will succeed on their merits, and not due to strident hype pieces.
I think some of the frustration the AI camp is dealing with right now is because y’all are the new Rust Evangelism Strike Force, just instead of “you’re a bad software engineer if you use a memory unsafe languages,” it’s “you’re a bad software engineer if you don’t use AI.”
The tools are at the point now that ignoring them is akin to ignoring Stack Overflow posts. Basically any time you'd google for the answer to something, you might as well ask an AI assistant. It has a good chance of giving you a good answer. And given how programming works, it's usually easy to verify the information. Just like, say, you would do with a Stack Overflow post.
Who you calling y'all? I'm a developer who was skeptical about AI until about 6 months ago, and then used it, and am now here to say "this shit works". That's all. I write Go, not Rust.
People have all these feelings about AI hype, and they just have nothing at all to do with what I'm saying. How well the tools work have not much at all to do with the hype level. Usually when someone says that, they mean "the tools don't really work". Not this time.
> You could say the same thing about tulip bulbs or any other famous bubble. Lots of smart people with no stake get sucked in.
While I agree with the skepticism, what specifically is the stake here? Most code assists have usable plans in the $10-$20 range. The investors are apparently taking a much bigger risk than the consumer would be in a case like this.
Aside from the horror stories about people spending $100 in one day of API tokens for at best meh results, of course.
The stake they and I were referring to is a financial interest in the success of AI. Related is the reputational impact, of course. A lot of people who may not make money do like being seen as smart and cutting edge.
But even if we look at your notion of stake, you're missing huge chunks of it. Code bases are extremely expensive assets, and programmers are extremely expensive resources. $10 a month is nothing compared to the costs of a major cleanup or rewrite.
Dude. Claude Code has zero learning curve. You just open the terminal app in your code directory and you tell it what you want, in English. In the time you have spent writing these comments about how you don't care to try it now because it's probably just hype, you could have actually tried it and found out if it's just hype.
I've tried Claude Code repeatedly and haven't figured out how to make it work for me on my work code base. It regularly gets lost, spins out of control, and spends a bunch of tokens without solving anything. I totally sympathize with people who find Claude Code to have a learning curve, and I'm writing this while waiting for Cursor to finish a task I gave it, so it's not like I'm unfamiliar with the tooling in general.
One big problem with Claude Code vs Cursor is that you have to pay for the cost of getting over the learning curve. With Cursor I could eat the subscription fee and then goof off for a long time trying to figure out how to prompt it well. With Claude Code a bad prompt can easily cost me $5 a pop, which (irrationally, but measurably) hurts more than the one-time monthly fee for Cursor.
Claude Code actually has a flat-rate subscription option now, if you prefer that. Personally I've found the API cost to be pretty negligible, but maybe I'm out of touch. (I mean, it's one AI-generated commit, Michael. What could it cost, $5?)
Anyway, if you've tried it and it doesn't work for you, fair enough. I'm not going to tell you you're wrong. I'm just bothered by all the people who are out here posting about AI being bad while refusing to actually try it. (To be fair, I was one of them, six months ago...)
I could not have, because my standards involve more than a five minute impression from a tool designed to wow people in the first five minutes. Dude.
I think you're rationalizing your resistance to change. I've been there!
I have no reason to care whether you use AI or not. I'm giving you this advice just for your sake: Consider whether you are taking a big career risk by avoiding learning about the latest tools of your profession.
> "Even 2 years ago, before agents or reasoning models, these LLMs were extremely powerful. The catch was, you needed to figure out what worked for you."
Sure, but I would argue that the UX is the product, and that has radically improved in the past 6-12 months.
Yes, you could have produced similar results before, manually prompting the model each time, copy and pasting code, re-prompting the model as needed. I would strenuously argue that the structuring and automation of these tasks is what has made these models broadly usable and powerful.
In the same way that Apple didn't event mobile phones nor touchscreens nor OSes, but the specific combination of these things resulted in a product that was different in kind than what came before, and took over the world.
Likewise, the "putting the LLM into a structured box of validation and automated re-prompting" is huge! It changed the product radically, even if its constituent pieces existed already.
[edit] More generally I would argue that 95% of the useful applications of LLMs aren't about advancing the SOTA model capabilities and more about what kind of structured interaction environment we shove them into.
For sure! I mainly meant to say that people should not attribute the "6 more months until it's really good" point as just another symptom of unfounded hype. It may have taken effort to effectively use AI earlier, which somewhat justified the caution, but now it's significantly easier and caution is counter-productive.
But I think my other point still stands: people will need to figure out for themselves how to fully exploit this technology. What worked for me, for instance, was structuring my code to be essentially functional in nature. This allows for tightly focused contexts which drastically reduces error rates. This is probably orthogonal to the better UX of current AI tooling. Unfortunately, the vast majority of existing code is not functional, and people will have to figure out how to make AI work with that.
A lot of that likely plays into your point about the work required to make useful LLM-based applications. To expand a bit more:
* AI is technology that behaves like people. This makes it confusing to reason about and work with. Products will need to solve for this cognitive dissonance to be successful, which will entail a combination of UX and guardrails.
* Context still seems to be king. My (possibly outdated) experience has been the "right" context trumps larger context windows. With code, for instance, this probably entails standard techniques like static analysis to find relevant bits of code, which some tools have been attempting. For data, this might require eliminating overfetching.
* Data engineering will be critical. Not only does it need to be very clean for good results, giving models unfettered access to the data needs the right access controls which, despite regulations like GDPR, are largely non-existent.
* Security in general will need to be upleveled everywhere. Not only can models be tricked, they can trick you into getting compromised, and so there need to even more guardrails.
A lot of these are regular engineering work that is being done even today. Only it often isn't prioritized because there are always higher priorities... like increasing shareholder value ;-) But if folks want to leverage the capabilities of AI in their businesses, they'll have to solve all these problems for themselves. This is a ton of work. Good thing we have AI to help out!
I don't think it's possible to understand what people mean by force multiplier re AI until you use it to teach yourself a new domain and then build something with that knowledge.
Building a mental model of a new domain by creating a logical model that interfaces with a domain I'm familiar with lets me test my assumptions and understanding in real time. I can apply previous experience by analogy and verify usefulness/accuracy instantly.
> Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions[...]
Part of the hype problem is that describing my experience sounds like bullshit to anyone who hasn't gone through the same process. The rate that I pick up concepts well enough to do verifiable work with them is literally unbelievable.
AI posts (including this one) are all over his employers blog lately, so there’s some stake (fly MCP, https://fly.io/blog/fuckin-robots/, etc).
Almost by definition, one should be skeptical about hype. So we’re all trying to sort out what is being sold to us.
Different people have different weird tendencies in different directions. Some people irrationally assume that things aren’t going to change much. Others see a trend and irrationally assume that it will continue on a trend line.
Synthesis is hard.
Understanding causality is even harder.
Savvy people know that we’re just operating with a bag of models and trying to choose the right combination for the right situation.
This misunderstanding is one reason why doomers, accelerations, and “normies” talk past each other or (worse) look down on each other. (I’m not trying to claim epistemic equivalence here; some perspectives are based on better information, some are better calibrated than others! I’m just not laying out my personal claims at this point. Instead, I’m focusing on how we talk to each other.)
Another big source of misunderstanding is about differing loci of control. People in positions of influence are naturally inclined to think about what they can do, who they know, and where they want to be. People farther removed feel relatively powerless and tend to hold onto their notions of stability, such as the status quo or their deepest values.
Historically, programmers have been quite willing to learn new technologies, but now we’re seeing widespread examples where people’s plasticity has limits. Many developers cannot (or are unwilling to) wrap their minds around the changing world. So instead of confronting the reality they find ways to deny it, consciously or subconsciously. Our perception itself is shaped by our beliefs, and some people won’t even perceive the threat because it is too strange or disconcerting. Such is human nature: we all do it. Sometimes we’re lucky enough to admit it.
I think "the reality", at least as something involving a new paradigm, has yet to be established. I'll note that I heard plenty of similar talk about how developers just couldn't adapt six months or more ago. Promoters now can admit those tools were in fact pretty bad, because they now have something else to promote, but at the time those not rawdogging LLMs were dinosaurs under a big meteor.
I do of course agree that some people are just refusing to "wrap their minds around the changing world". But anybody with enough experience in tech can count a lot more instances of "the world is about to change" than "the world really changed". The most recent obvious example being cryptocurrencies, but there are plenty of others. [1] So I think there's plenty of room here for legitimate skepticism. And for just waiting until things settle down to see where we ended up.
Fair points.
Generally speaking, I find it suspect when someone points to failed predictions of disruptive changes without acknowledging successful predictions. That is selection bias. Many predicted disruptive changes do occur.
Most importantly, if one wants to be intellectually honest, one has to engage against a set of plausible arguments and scenarios. Debunking one particular company’s hyperbolic vision for the future might be easy, but it probably doesn’t generalize.
It is telling to see how many predictions can seem obvious in retrospect from the right frame of reference. In a sense (or more than that under certain views of physics), the future already exists, the patterns already exist. We just have to find the patterns — find the lens or model that will help the messy world make sense to us.
I do my best to put the hype to the side. I try to pay attention to the fundamentals such as scaling laws, performance over time, etc while noting how people keep moving the goalposts.
Also wrt the cognitive bias aspect: Cryptocurrencies didn’t threaten to apply significant (if any) downward pressure on the software development labor market.
Also, even cryptocurrency proponents knew deep down that it was a chicken and the egg problem: boosters might have said adoption was happening and maybe even inevitable, but the assumption was right out there in the open. It also had the warning signs of obvious financial fraud, money laundering, currency speculation, and ponzi scheming.
Adoption of artificial intelligence is different in many notable ways. Most saliently, it is not a chicken and egg problem: it does not require collective action. Anyone who does it well has a competitive advantage. It is a race.
(Like Max Tegmark and others, I view racing towards superintelligence as a suicide race, not an arms race. This is a predictive claim that can be debated by assessing scenarios, understanding human nature, and assigning probabilities.)
> Generally speaking, I find it suspect when someone points to failed predictions of disruptive changes without acknowledging successful predictions.
I specifically said: "But anybody with enough experience in tech can count a lot more instances of 'the world is about to change' than 'the world really changed'. I pretty clearly understand that sometimes the world does change.
Funnily, I find it suspect when people accuse me of failing to do things I did in the very post they're responding to. So I think this is a fine time for us both to find better ways to spend our time.
Sorry, I can see why you might take that the wrong way. In my defense, I consciously wrote "generally speaking" in the hopes you wouldn't think I was referring to you in particular. I wasn't trying to accuse you of anything.
I strive to not criticize people indirectly: my style is usually closer to say New York than San Francisco. If I disagree with something in particular, I try to make that clear without beating around the bush.
> Promoters now can admit those tools were in fact pretty bad
Relative to what came after, which noone could predict would be guaranteed?
The Model T was in fact pretty bad relative to what came after...
> because they now have something else to promote
something else which is better?
i don't understand the inherent cynicism here.
I’m an amateur coder and I used to rely on Cursor a lot to code when I was actively working on hobby apps about 6 months ago
I picked coding again a couple of days back and I’m blown away by how much things have changed
It was all manual work until a few months back. Suddenly, its all agents
> You'll not only never know this, it's IMHO not very useful to think about at all, except as an intellectual exercise.
I think it's very useful if one wants to properly weigh the value of LLMs in a way that gets beyond the hype. Which I do.
My 80-year old dad tells me that when he bought his first car, he could pop open the hood and fiddle with things and maybe get it to work after a breakdown
Now he can't - it's too closed and complicated
Yet, modern cars are way better and almost never breakdown
Don't see how LLMs are any different than any other tech advancement that obfuscates and abstracts the "fundamentals".
I definitely believe that you don't see it. We just disagree on what that implies.
Oops, this was a reply to somebody else put in the wrong place. Sorry for the confusion.
"nother counterfactual that we'll never know is what kinds of tooling we would have gotten if people had dumped a few billion dollars into code tool improvement without LLMs, but with, say, a lot of more conventional ML tooling. Would the tools we get be much better? Much worse? About the same but different in strengths and weaknesses? Impossible to say."
You'll not only never know this, it's IMHO not very useful to think about at all, except as an intellectual exercise.
I wish i could impress this upon more people.
A friend similarly used to lament/complain that Kotlin sucked in part because we could have probably accomplished it's major features in Java, and maybe without tons of work, or migration cost.
This is maybe even true!
as an intellectual exercise, both are interesting to think about. But outside of that, people get caught up in this as if it matters, but it doesn't.
Basically nothing is driven by pure technical merit alone, not just in CS, but in any field. So my point to him was the lesson to take away from this is not "we could have been more effective or done it cheaper or whatever" but "my definition of effectiveness doesn't match how reality decides effectiveness, so i should adjust my definition".
As much as people want the definition to be a meritocracy, it just isn't and honestly, seems unlikely to ever be.
So while it's 100% true that billions of dollars dumped into other tools or approaches or whatever may have have generated good, better, maybe even amazing results, they weren't, and more importantly, never would have been. Unknown but maybe infinite ROI is often much more likely to see investment than more known but maybe only 2x ROI.
and like i said, this is not just true in CS, but in lots of fields.
That is arguably quite bad, but also seems unlikely to change.
> You'll not only never know this, it's IMHO not very useful to think about at all, except as an intellectual exercise.
I think it's very useful if one wants to properly weigh the value of LLMs in a way that gets beyond the hype. Which I do.
Sure, and that works in the abstract (ie "what investment would theoretically have made the most sense") but if you are trying to compare in the real world you have to be careful because it assumes the alternative would have ever happened. I doubt it would have.
- [deleted]
The better I am at solving a problem, the less I use AI assistants. I use them if I try a new language or framework.
Busy code I need to generate is difficult to do with AI too. Because then you need to formalize the necessary context for an AI assistant, which is exhausting with an unsure result. So perhaps it is just simpler to write it yourself quickly.
I understand comments being negative, because there is so much AI hype without having to many practical applications yet. Or at least good practical applications. Some of that hype is justified, some of it is not. I enjoyed the image/video/audio synthesis hype more tbh.
Test cases are quite helpful and comments are decent too. But often prompting is more complex than programming something. And you can never be sure if any answer is usable.
> But often prompting is more complex than programming something.
I'd challenge this one; is it more complex, or is all the thinking and decision making concentrated into a single sentence or paragraph? For me, programming something is taking a big high over problem and breaking it down into smaller and smaller sections until it's a line of code; the lines of code are relatively low effort / cost little brain power. But in my experience, the problem itself and its nuances are only defined once all code is written. If you have to prompt an AI to write it, you need to define the problem beforehand.
It's more design and more thinking upfront, which is something the development community has moved away from in the past ~20 years with the rise of agile development and open source. Techniques like TDD have shifted more of the problem definition forwards as you have to think about your desired outcomes before writing code, but I'm pretty sure (I have no figures) it's only a minority of developers that have the self-discipline to practice test-driven development consistently.
(disclaimer: I don't use AI much, and my employer isn't yet looking into or paying for agentic coding, so it's chat style or inline code suggestions)
The issue with prompting is English (or any other human language) is nowhere near as rigid or strict a language as a programming language. Almost always an idea can be expressed much more succinctly in code than language.
Combine that with when you’re reading the code it’s often much easier to develop a prototype solution as you go and you end up with prompting feeling like using 4 men to carry a wheelbarrow instead of having 1 push it.
I think we are going to end up with common design/code specification language that we use for prompting and testing. There's always going to be a need to convey the exact semantics of what we want. If not, for AI then for the humans who have to grapple with what is made.
Sounds like "Heavy process". "Specifying exact semantics" has been tried and ended up unimaginably badly.
Nah, imagine a programming language optimized for creating specifications.
Feed it to an llm and it implements it. Ideally it can also verify it's solution with your specification code. If LLMs don't gain significantly more general capabilities I could see this happening in the longer term. But it's too early to say.
In a sense the llm turns into a compiler.
It's an interesting idea. I get it. Although I wonder.... do you really need formal languages anymore now that we have LLMs that can take natural language specifications as input.
I tried running the idea on a programming task I did yesterday. "Create a dialog to edit the contents of THIS data structure." It did actually produce a dialog that worked the first time. Admitedly a very ugly dialog. But all the fields and labels and controls were there in the right order with the right labels, and were all properly bound to props of a react control, that was grudgingly fit for purpose. I suspect I could have corrected some of the layout issues with supplementary prompts. But it worked. I will do it again, with supplementary prompts next time.
Anyway. I next thought about how I would specify the behavior I wanted. The informal specification would be "Open the Looping dialog. Set Start to 1:00, then open the Timebase dialog. Select "Beats", set the tempo to 120, and press the back button. Verify that the Start text edit now contains "30:1" (the same time expressed in bars and beats). Set it to 10:1,press the back button, and verify that the corresponding "Loop" <description of storage for that data omited for clarity> for the currently selected plugin contains 20.0. I can actually see that working (and I plan to see if I can convince an AI to turn that into test code for me).
Any imaginable formal specification for that would be just grim. In fact, I can't imagine a "formal" specification for that. But a natural language specification seems eminently doable. And even if there were such a formal specification, I am 100% positive that I would be using natural language AI prompts to generate the specifications. Which makes me wonder why anyone needs a formal language for that.
And I can't help thinking that "Write test code for the specifications given in the previous prompt" is something I need to try. How to give my AI tooling to get access to UI controls though....
We've had that for a long, long time. Notably RAD-tooling running on XML.
The main lesson has been that it's actually not much of an enabler and the people doing it end up being specialised and rather expensive consultants.
RAD before transformers was like trying to build an iPhone before capacitive multitouch: a total waste of time.
Things are different now.
I'm not so sure. What can you show me that you think would be convincing?
I think there are enough examples of genuine AI-facilitated rapid application development out there already, honestly. I wouldn't have anything to add to the pile, since I'm not a RAD kind of guy.
Disillusionment seems to spring from expecting the model to be a god or a genie instead of a code generator. Some people are always going to be better at using tools than other people are. I don't see that changing, even though the tools themselves are changing radically.
"Nothing" would have been shorter and more convenient for us both.
That's a straw man. Asking for real examples to back up your claims isn't overt perfectionism.
If you weren't paying attention to what's been happening for the last couple of years, you certainly won't believe anything I have to say.
Trust me on this, at least: I don't need the typing practice.
A big challenge is that programmers all have unique ever changing personal style and vision that they've never had to communicate before. As well they generally "bikeshed" and add undefined unrequested requirements, because you know someday we might need to support 10000x more users than we have. This is all well and good when the programmer implements something themselves but falls apart when it must be communicated to an LLM. Most projects/systems/orgs don't have the necessary level of detail in their documentation, documentation is fragmented across git/jira/confluence/etc/etc/etc., and it's a hodge podge of technologies without a semblance of consistency.
I think we'll find that over the next few years the first really big win will be AI tearing down the mountain of tech & documentation debt. Bringing efficiency to corporate knowledge is likely a key element to AI working within them.
Efficiency to corporate knowledge? Absolutely not, no way. My coworkers are beginning to use AI to write PR descriptions and git commits.
I notice, because the amount of text has been increased tenfold while the amount of information has stayed exactly the same.
This is a torrent of shit coming down on us, that we are all going to have to deal with it. The vibe coders will be gleefully putting up PRs with 12 paragraphs of "descriptive" text. Thanks no thanks!
Well I'm certainly not saying that AI should generate more corporate spam. That's part of them problem! And also a strawman argument!
Use a llm to summarize the PR /j
I design and think upfront but I don't write it down until I start coding. I can do this for pretty large chunks of code at once.
The fastest way I can transcribe a design is with code or pseudocode. Converting it into English can be hard.
It reminds me a bit of the discussion of if you have an inner monologue. I don't and turning thoughts into English takes work, especially if you need to be specific with what you want.
I also don't have an inner monologue and can relate somewhat. However I find that natural language (usually) allows me to be more expressive than pseudocode in the same period of time.
There's also an intangible benefit of having someone to "bounce off". If I'm using an LLM, I am tweaking the system prompt to slow it down, make it ask questions and bug me before making changes. Even without that, writing out the idea displays quickly potential logic or approach flaws - much fast than writing pseudo in my experience.
> It's more design and more thinking upfront, which is something the development community has moved away from in the past ~20 years with the rise of agile development and open source.
I agree, but even smaller than thinking in agile is just a tight iteration loop when i'm exploring a design. My ADHD makes upfront design a challenge for me and I am personally much more effective starting with a sketch of what needs to be done and then iterating on it until I get a good result.
The loop of prompt->study->prompt->study... is disruptive to my inner loop for several reasons, but a big one is that the machine doesn't "think" like i do. So the solutions it scaffolds commonly make me say "huh?" and i have to change my thought process to interpet them and then study them for mistakes. My intution and iteration is, for the time being, more effective than this machine assited loop for the really "interesting" code i have to write.
But i will say that AI has been a big time saver for more mundane tasks, especially when I can say "use this example and apply it to the rest of this code/abstraction".
I agree with your points but I'm also reminded of one my bigger learnings as a manager - the stuff I'm best at is the hardest, but most important, to delegate.
Sure it was easier to do it myself. But putting in the time to train, give context, develop guardrails, learn how to monitor etc ultimately taught me the skills needed to delegate effectively and multiply the teams output massively as we added people.
It's early days but I'm getting the same feeling with LLMs. It's as exhausting as training an overconfident but talented intern, but if you can work through it and somehow get it to produce something as good as you would do yourself, it's a massive multiplier.
I don't totally understand the parallel you're drawing here. As a manager, I assume you're training more junior (in terms of their career or the company) engineers up so they can perform more autonomously in the future.
But you're not training LLMs as you use them really - do you mean that it's best to develop your own skill using LLMs in an area you already understand well?
I'm finding it a bit hard to square your comment about it being exhausting to catherd the LLM with it being a force multiplier.
No I'm talking about my own skills. How I onboard, structure 1on1s, run meetings, create and reuse certain processes, manage documentation (a form of org memory), check in on status, devise metrics and other indicators of system health. All of these compound and provide leverage even if the person leaves and a new one enters.the 30th person I onboarded and managed was orders of magnitude easier (for both of us) than the first.
With LLMs the better I get at the scaffolding and prompting, the less it feels like catherding (so far at least). Hence the comparison.
Great point.
Humans really like to anthropomorphize things. Loud rumbles in the clouds? There must be a dude on top of a mountain somewhere who's in charge of it. Impressed by that tree? It must have a spirit that's like our spirits.
I think a lot of the reason LLMs are enjoying such a huge hype wave is that they invite that sort of anthropomorphization. It can be really hard to think about them in terms of what they actually are, because both our head-meat and our culture has so much support for casting things as other people.
Do LLMs learn? I had an impression you borrow a pretrained LLM that handles each query starting with the same initial state.
No, LLMs don't learn - each new conversation effectively clears the slate and resets them to their original state.
If you know what you're doing you can still "teach" them though, but it's on you to do that - you need to keep on iterating on things like the system prompt you are using and the context you feed in to the model.
This sounds like trying to glue on supervised learning post-hoc.
Makes me wonder if there had been equal investment into specialized tools which used more fine-tuned statistical methods (like supervised learning), that we would have something much better then LLMs.
I keep thinking about spell checkers and auto-translators, which have been using machine learning for a while, with pretty impressive results (unless I’m mistaken I think most of those use supervised learning models). I have no doubt we will start seeing companies replacing these proven models with an LLM and a noticeable reduction in quality.
That's mostly, but not completely true. There are various strategies to get LLMs to remember previous conversations. ChatGPT, for example, remembers (for some loose definition of "remembers") all previous conversations you've had with it.
I think if you use a very loose definition of learning: A stimuli which alters subsequent behavior you can claim this is learning. But if you tell a human to replace the word “is” with “are” in the next two sentences, this could hardly be considered learning, rather it is just following commands, even though it meets the previous loose definition. This is why in psychology we usually include some timescale for how long the altered behavior must last for it to be considered learning. A short-term altered behavior is usually called priming. But even then I wouldn’t even consider “following commands” to be neither priming nor learning, I would simply call it obeying.
If an LLM learned something when you gave it commands, it would probably be reflected in some adjusted weights in some of its operational matrix. This is true of human learning, we strengthen some neural connection, and when we receive a similar stimuli in a similar situation sometime in the future, the new stimuli will follow a slightly different path along its neural pathway and result in a altered behavior (or at least have a greater probability of an altered behavior). For an LLM to “learn” I would like to see something similar.
I think you have an overly strict definition of what "learning" means. ChatGPT now has memory that lasts beyond the lifetime of it's context buffer, and now has at least medium term memory. (Actually I'm not entirely sure that they are not just using long persistent context buffers, but anyway).
Admittedly, you have to wrap LLMs to with stuff to get them to do that. If you want to rewrite the rules to excluded that then I will have to revise my statement that it is "mostly, but not completely true".
:-P
You also have to alter some neural pathways in your brain to follow commands. That doesn’t make it learning. Learned behavior is usually (but not always) reflected in long term changes to neural pathways outside of the language centers of the brain, and outside of the short-term memory. Ones you forget the command, and still apply the behavior, that is learning.
I think SSR schedulers are a good example of a Machine Learning algorithms that learns from it’s previous interactions. If you run the optimizer you will end up with a different weight matrix, and flashcards will be schedule differently. It has learned how well you retain these cards. But an LLM that is simply following orders has not learned anything, unless you feed the previous interaction back into the system to alter future outcomes, regardless of whether it “remembers” the original interactions. With the SSR, your review history is completely forgotten about. You could delete it, but the weight matrix keeps the optimized weights. If you delete your chat history with ChatGPT, it will not behave any differently based on the previous interaction.
I'd count ChatGPT memory as a feature of ChatGPT, not of the underlying LLM.
I wrote a bit about that here - I've turned it off: https://simonwillison.net/2025/May/21/chatgpt-new-memory/
Yes with few shots. you need to provide at least 2 examples of similar instructions and their corresponding solutions. But when you have to build few shots every time you prompt it feels like you're doing the work already.
Edit: grammar
But... But... the multiplier isn't NEW!
You just explained how your work was affected by a big multiplier. At the end of training an intern you get a trained intern -- potentially a huge multiplier. ChatGPT is like an intern you can never train and will never get much better.
These are the same people who would no longer create or participate deeply in OSS (+100x multipler) bragging about the +2x multiplier they got in exchange.
The first person you pass your knowledge onto can pass it onto a second. ChatGPT will not only never build knowledge, it will never turn from the learner to the mentor passing hard-won knowledge on to another learner.
[dead]
> But often prompting is more complex than programming something. It may be more complex, but it is in my opinion better long term. We need to get good at communicating with AIs to get results that we want. Forgive me assuming that you probably didn't use these assistants long enough to get good at using them. I'm web developer for 20 years already and AI tools are multiplying my output even in problems I'm very good at. And they are getting better very quickly.
Yep, it looks like LLMs are used as fast typists, and coincidentally in webdev typing speed is the most important bottleneck when you need to add cookie consent, spinners, dozens of ad providers, tracking pixels, twitter metadata, google metadata, manual rendering, buttons web components with material design and react, hover panels, fontawesome, recaptcha, and that's only 1% of modern web boilerplate, then it's easy to see how a fast typist can help you.
> The better I am at solving a problem, the less I use AI assistants.
Yes, but you're expensive.
And these models are getting better at solving a lot of business-relevant problems.
Soon all business-relevant problems will be bent to the shape of the LLM because it's cost-effective.
You're forgetting how much money is being burned in keeping these LLMs cheap. Remember when Uber was a fraction of the cost of a cab? Yeah, those days didn't last.
> Remember when Uber was a fraction of the cost of a cab? Yeah, those days didn't last.
They're still much cheaper where I am. But regardless, why not take the Uber while it's cheaper?
There's the argument of the taxi industry collapsing (it hasn't yet). Is your concern some sort of long term knowledge loss from programmers and a rug pull? There are many good LLM options out there, they're getting cheaper and the knowledge loss wouldn't be impactful (and rug pull-able) for at least a decade or so.
Even at 100x the cost (currently $20/month for most of these via subscriptions) it’s still cheaper than an intern, let alone a senior dev.
I have been in this industry since the mid 80s. I can't tell you how many people worry that I can't handle change because as a veteran, I must cling to what was. Meanwhile, of course, the reason I am still in the industry is because of my plasticity. Nothing is as it was for me, and I have changed just about everything about how I work multiple times. But what does stay the same all this time are people and businesses and how we/they behave.
Which brings me to your comment. The comparison to Uber drivers is apt, and to use a fashionable word these days, the threat to people and startups alike is "enshittification." These tools are not sold, they are rented. Should a few behemoths gain effective control of the market, we know from history that we won't see these tools become commodities and nearly free, we'll see the users of these tools (again, both people and businesses) squeezed until their margins are paper-thin.
Back when articles by Joel Spolsky regularly hit the top page of Hacker News, he wrote "Strategy Letter V:" https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
The relevant takeaway was that companies try to commoditize their complements, and for LLM vendors, every startup is a complement. A brick-and-mortar metaphor is that of a retailer in a mall. If you as a retailer are paying more in rent than you're making, you are "working for the landlord," just as if you are making less than 30% of profit on everything you sell or rent through Apple's App Store, you're working for Apple.
I once described that as "Sharecropping in Apple's Orchard," and if I'm hesitant about the direction we're going, it's not anything about clinging to punch cards and ferromagnetic RAM, it's more the worry that it's not just a question of programmers becoming enshittified by their tools, it's also the entire notion of a software business "Sharecropping the LLM vendor's fields."
We spend way too much time talking about programming itself and not enough about whither the software business if its leverage is bound to tools that can only be rented on terms set by vendors.
--------
I don't know for certain where things will go or how we'll get there. I actually like the idea that a solo founder could create a billion-dollar company with no employees in my lifetime. And I have always liked the idea of software being "Wheels for the Mind," and we could be on a path to that, rather than turning humans into "reverse centaurs" that labour for the software rather than the other way around.
Once upon a time, VCs would always ask a startup, "What is your Plan B should you start getting traction and then Microsoft decides to compete with you/commoditize you by giving the same thing away?" That era passed, and Paul Graham celebrated it: https://paulgraham.com/microsoft.html
Then when startups became cheap to launch—thank you increased tech leverage and cheap money and YCombinator industrializing early-stage venture capital—the question became, "What is your moat against three smart kids launching a competitor?"
Now I wonder if the key question will bifurcate:
1. What is your moat against somebody launching competition even more cheaply than smart kids with YCombinator's backing, and;
2. How are you insulated against the cost of load-bearing tooling for everything in your business becoming arbitrarily more expensive?
Actually, I agree. It won't be long before businesses handle software engineering like Google does "support." You know, that robotic system that sends out passive-aggressive mocking emails to people who got screwed over by another robot that locks them out of their digital lives for made up reasons [1]. It saves the suits a ton of cash while letting them dodge any responsibility for the inevitable harm it'll cause to society. Mediocrity will be seen as a feature, and the worst part is, the zealots will wave it like a badge of honor.
I totally agree. The ”hard to control mech suit” is an excellent analogy.
When it works it’s brilliant.
There is a threshold point as part of the learning curve where you realize you are in a pile of spaghetti code and think it actually saves no time to use LLM assistant.
But then you learn to avoid the bad parts - thus they don’t take your time anymore - and the good parts start paying back in heaps of the time spent learning.
They are not zero effort tools.
There is a non-trivial learning cost involved.
The issue is we’re too early in the process to even have a solid education program for using LLMs. I use them all the time and continue to struggle finding an approach that works well. It’s easy to use them for documentation look up. Or filling in boilerplate. Sometimes they nail a transformation/translation task, other times they’re more trouble than they’re worth.
We need to understand what kind of guard rails to put these models on for optimal results.
” we’re too early in the process to even have a solid education program for using LLMs”
We don’t even have a solid education program for software engineering - possibly for the same reason.
The industry loves to run on the bleeding edge, rather than just think for a minute :)
when you stop to think, your fifteen (...thousand) competitors will all attempt a different version of the thing you're thinking about and one of them will be the about the thing you'll come up with, except it'll be built.
it might be ok since what you were thinking about is probably not a good idea in the first place for various reasons, but once in a while stars align to produce the unicorn, which you want to be if you're thinking about building something.
caveat: maybe you just want to build in a niche, it's fine to think hard in such places. usually.
Fwiw a legion of wishfull app developers is not ”the industry”. It’s fine for individuals to move fast.
Institution scale lack of deep thinking is the main issue.
> We don’t even have a solid education program for software engineering - possibly for the same reason.
There's an entire field called computer science. ACM provides curricular recommendations that it updates every few years. People spend years learning it. The same can't be said about the field of, prompting.
But nobody seems to trust any formally specified education, hence practices like whiteboarding as part of job interviews.
How do we know a software engineer is competent? We can’t tell, and damned if we trust that msc he holds.
Computer science, while fundamental, is very little of help in the emergent large scale problems which ”software engineering” tries to tackle.
The key problem is converting capital investment to a working software with given requirements and this is quite unpredictable.
We don’t know how to effectively train software engineers so that software projects would be predictable.
We don’t know how to train software engineers so that employers would trust their degrees as a strong signal of competence.
If there is a university program that, for example FAANGM (or what ever letters forms the pinnacle of markets) companies respect as a clear signal of obvious competence as a software engineer I would like to know what that is.
That says more about the industry than the quality of formal education. After all, it's the very same industry that's hailing mediocre robots as replacements of human software engineers. Even the article has this to say:
> As a mid-late career coder, I’ve come to appreciate mediocrity.
Then there's also the embracement of anti-intellectualism. "But I don't want to spend time learning X!" is a surprisingly common comment on, er, Hacker News.
So yeah, no surprise that formal education is looked down on. Doesn't make it right though.
also, the agents are actually pretty good at cleaning up spaghetti if you do it one module at a time, use unit tests. And some of the models are smart enough to suggest good organization schemes!
For what it's worth: I'm not dismissive of the idea that these things could be ruinous for the interests of the profession. I don't automatically assume that making applications drastically easier to produce is just going to make way for more opportunities.
I just don't think the interest of the profession control. The travel agents had interests too!
For a long time there has been back chatter on how to turn programming into a more professional field, more like actual engineering where when something goes wrong actual people and companies start to take security seriously, and get held accountable for their mistakes, and start to actually earn their high salaries.
Getting AI to hallucinate its way into secure and better quality code seems like the antithesis of this. Why don't we have AI and robots working for humanity with the boring menial tasks - mowing laws, filing taxes, washing dishes, driving cars - instead of attempting to take on our more critical and creative outputs - image generation, movie generation, book writing and even website building.
The problem with this argument is that it's not what's going to happen. In the trajectory I see of LLM code generation, security quality between best-practices well-prompted (ie: not creatively well prompted, just people with a decent set of Instructions.md or whatever) and well trained human coders is going to be a wash. Maybe in 5 years SOTA models will clearly exceed human coders on this, but my premise is all progress stops and we just stick with what we have today.
But the analysis doesn't stop there, because after the raw quality wash, we have to consider things LLMs can do profoundly better than human coders can. Codebase instrumentation, static analysis, type system tuning, formal analysis: all things humans can do, spottily, on a good day but that empirically across most codebases they do not do. An LLM can just be told to spend an afternoon doing them.
I'm a security professional before I am anything else (vulnerability research, software security consulting) and my take on LLM codegen is that they're likely to be a profound win for security.
Isn't formal analysis exactly the kind of thing LLMs can't do at all? Or do you mean an LLM invoking a proof assistant or something like that?
Yes, I mean LLMs generating proof specs and invoking assistants, not that they themselves do any formal modeling.
> Why don't we have AI and robots working for humanity with the boring menial tasks - mowing laws, filing taxes, washing dishes, driving cars
I mean, we do have automation for literally all of those things, to varying degrees of effectiveness.
There's an increasing number of little "roomba" style mowers around my neighborhood. I file taxes every year with FreeTaxUSA and while it's still annoying, a lot of menial "form-filling" labor has been taken away from me there. My dishwasher does a better job cleaning my dishes than I would by hand. And though there's been a huge amount of hype-driven BS around 'self-driving', we've undeniably made advances in that direction over the last decade.
Soon as the world realized they don't need a website and can just have FB/Twitter page, a huge percentage of freelance web development gigs just vanished. We have to get real about what's about to happen. The app economy filled the gap, and the only optimistic case is the AI app industry is what's going to fill the gap going forward. I just don't know about that. There's a certain end-game vibes I'm getting because we're talking about self-building and self-healing software. More so, a person can ask the AI to role play anything, even an app.
Sure. And before the invention of the spreadsheet, the world's most important programming language, individual spreadsheets were something a programmer had to build for a business.
Except that FB/Twitter are rotting platforms. I don't pretend that freelance web dev is a premium gig, but setting up Wordpress sites for local flower shops etc. shouldn't require a higher level of education/sophistication than e.g. making physical signs for the same shops.
Technical? Yes. Hardcore expert premium technical, no. The people who want the service can pay someone with basic to moderate skills a few hundred bucks to spend a day working on it, and that's all good.
Could I get an LLM to do much of the work? Yes, but I could also do much of the work without an LLM. Someone who doesn't understand the first principles of domains, Wordpress, hosting and so on, not so much.
Except that FB/Twitter are rotting platforms.
They were not rotting platforms when they evaporated jobs at that particular moment, about 10-15 years ago. There's no universe where people are making money making websites. One could easily collect multi thousand dollars per month just making websites awhile ago before twitter/fb pages just on the side. There is a long history to web development.
Also, the day of the website has been over for quite awhile so I don't even buy the claim that social media is a rotting platform.
None of the LLM models are self-building, self-healing or even self-thinking or self-teaching. They are static models (+rag, but that's a bolt-on). Did you have a specific tech in mind?
> We have to get real about what's about to happen.
Or maybe shouldn't enthusiastically repeat the destruction of the open web in favor of billionaire-controlled platforms for surveillance and manipulation.
Start getting to be friends with some billionaire (or... shh... trillionaire) families, Elysium is coming!
It's kind of ironic to me that this is so often the example trotted out. Look at the BLS data sheet for job outlook: https://www.bls.gov/ooh/sales/travel-agents.htm#tab-6
> Employment of travel agents is projected to grow 3 percent from 2023 to 2033, about as fast as the average for all occupations.
The last year there is data for claims 68,800 people employed as travel agents in the US. It's not a boom industry by any means, but it doesn't appear they experienced the apocalypse that Hacker News believes they did, either.
I don't know how to easily find historical data, unfortunately. BLS publishes the excel sheets, but pulling out the specific category would have to be done manually as far as I can tell. There's this, I guess: https://www.travelagewest.com/Industry-Insight/Business-Feat...
It appears at least that what happened is, though it may be easier than ever to plan your own travel, there are so many more people traveling these days than in the past that the demand for travel agents hasn't crashed.
https://www.vice.com/en/article/why-are-travel-agents-still-...
Has some stats. It seems pretty clear the interests of travel agents did not count for much in the face of technological change.
https://fred.stlouisfed.org/series/LEU0254497900A
40% of all travel agent jobs lost between 2001 and 2025. Glad I'm not a travel agent.
500,000 tech R&D jobs lost since 2017... Glad I'm not... Oh. Wait I AM!! Probably due to toxic Trumpian tax changes, though.
Let's be real. Software engineers are skeptical right now not because they believe robots are better than them. Quite the opposite. The suits will replace software engineers despite its mediocrity.
It was just 2 weeks ago when the utter incompetence of these robots were in full public display [1]. But none of that will matter to greedy corporate executives, who will prioritize short-term cost savings. They will hop from company to company, personally reaping the benefits while undermining essential systems that users and society rely on with robot slop. That's part of the reason why the C-suites are overhyping the technology. After all, no rich executive has faced consequences for behaving this way.
It's not just software engineering jobs that will take a hit. Society as a whole will suffer from the greedy recklessness.
The reason I remain in the "skeptical" camp is because I am experiencing the same thing you are - I keep oscillating between being impressed, then disappointed.
Ultimately the thing that impresses me is that LLMs have replaced google search. The thing that disappoints me is that their code is often convincing but wrong.
Coming from a hard-engineering background, anything that is unreliable is categorized as bad. If you come from the move-fast-break-things world of tech, then your tolerance for mistakes is probably a lot higher.
This is a bit tangential, but isn't that partly because google search keeps evolving into a worse resource due to the SEO garbage race?
It is, AI lets you have an ad-free web browsing experience. This is a huge part of it as well.
LLM-generated blogspam is also accelerating this process
And are LLMs immune to that same SEO garbage race?
I have been using Windsurf for a few months and ChatGPT for a couple of years. I don't feel Windsurf is a massive game changer personally. It is good if you are very tired or working in a new area (also good for exploring UI ideas as the feedback loop is tight), but still not a real game changer over ChatGPT. Waiting around for it to do its thing ("we've encountered at error - no credits used") is boring and flow destroying. Of you know exactly what you are doing the productivity is probably 0.5 vs just typing the code in yourself. Sorry, I'm not going to bang around in Windsurf all day just to help with the training so that "v2" can be better. They should be paying me for this realistically.
Of course, in aggregate AI makes me capable in a far broader set of problem domains. It would be tough to live without it at this stage, but needs to be used for what it is actually good at, not what we hope it will be good at.
Have you tried Cursor or Zed? I find they’re both significantly better in their “agent” modes than Windsurf.
I used Cursor before Windsurf but I have not used Zed.
> What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.
I tried the latest Claude for a very complex wrapper around the AWS Price APIs who are not easy to work with. Down a 2,000 line of code file, I found Claude faking some API returns by creating hard coded values. A pattern I have seen professional developers being caught on while under pressure to deliver.
This will be a boon to the human skilled developers, that will be hired at $900 dollars an hour to fix bugs of a subtlety never seen before.
More or less this. Maybe a job opportunity, but many decision makers won't see the real problem until they get hit by that AWS bill. Ironic, if the business won't hire you because they went out of business?
I mean, that bug doesnt seem very subtle.
I swear this is not me..
"Claude gives up and hardcodes the answer as a solution" - https://www.reddit.com/r/ClaudeAI/comments/1j7tiw1/claude_gi...
I did not want to bend the truthfulness of my story, to make a valid logical argument more convincing... :-)
The arguments seem to come down to tooling. The article suggests that ChatGPT isn't a good way to interact with LLMs but I'm not so sure. If the greatest utility is "rubber ducking" and editing the code yourself is necessary then tools like Cursor go too far in a sense. In my own experience, Windsurf is good for true vibe coding where I just want to explore an idea and throw away the code. It is still annoying though as it takes so long to do things - ruining any kind of flow state you may have. I am conversing with ChatGPT directly much more often.
I haven't tried Claud code yet however. Maybe that approach is more on point.
Totally agree with "vibe debt". Letting an LLM off-leash without checks is a fast track to spaghetti. But with tests, clear prompts, and some light editing, I’ve shipped a lot of real stuff faster than I could have otherwise.
I generally agree with the attitude of the original post as well. But I stick one one point. It definitely doesn't cost 20 dollars a month, cursor.ai might and I don't know how good it is, but claude code costs hundreds of dollars a month, still cheaper than a junior dev though.
> Did Photoshop kill graphic artists? Did film kill theatre?
To a first approximation, the answer to both of these is "yes".
There is still a lot of graphic design work out there (though generative AI will be sucking the marrow out of it soon), but far less than there used to be before the desktop publishing revolution. And the kind of work changed. If "graphic design" to you meant sitting at a drafting table with pencil and paper, those jobs largely evaporated. If that was a kind of work that was rewarding and meaningful to you, that option was removed for you.
Theatre even more so. Yes, there are still some theatres. But the number of people who get to work in theatrical acting, set design, costuming, etc. is a tiny tiny fraction of what it used to be. And those people are barely scraping together a living, and usually working side jobs just to pay their bills.
> it feels a bit like mourning the loss of punch cards when terminals showed up.
I think people deserve the right to mourn the loss of experiences that are meaningful and enjoyable to them, even if those experiences turn out to no longer be maximally economically efficient according to the Great Capitalistic Moral Code.
Does it mean that we should preserve antiquated jobs and suffer the societal effects of inefficiency without bound? Probably not.
But we should remember that the ultimate goal of the economic system is to enable people to live with meaning and dignity. Efficiency is a means to that end.
But the number of people who get to work in theatrical acting, set design, costuming
I think this ends up being recency bias and terminology hairsplitting, in the end. The number of people working in theatre mask design went to nearly zero quite a while back but we still call the stuff in the centuries after that 'theatre' and 'acting'.
I'm not trying to split hairs.
I think "theatre" is a fairly well-defined term to refer to live performances of works that are not strictly musical. Gather up all of the professions necessary to put those productions on together.
The number of opportunities for those professions today is much smaller than it was a hundred years ago before film ate the world.
There are only so many audience members and a night they spend watching a film or watching TV or playing videogames is a night they don't spend going to a play. The result is much smaller audiences. And with fewer audiences, there are fewer plays.
Maybe I should have been clearer that I'm not including film and video production here. Yes, there are definitely opportunities there, though acting for a camera is not at all the same experience as acting for a live audience.
> I think "theatre" is a fairly well-defined term to refer to live performances of works
Doesn't it mean cinema too? edit: Even though it was clear from context you meant live theatre.
Right but modern theatre is pretty new itself. The number of people involved in performance for the enjoyment of others has spiked, err, dramatically. My point is that making this type argument seems to invariably involve picking some narrow thing and elevating it to a true and valuable artform deserving special consideration and mourning. Does it have a non-special-pleading variety?
Well, I didn't pick theatre and Photoshop as narrow things, the parent comment did.
I'm saying an artform that is meaningful to its participants and allows them to make a living wage while enriching the lives' of others should not be thoughtlessly discarded in slave to the almighty god of economic efficiency. It's not special pleading because I'd apply this to all artforms and all sorts of work that bring people dignity and joy.
I'm not a reactionary luddite saying that we should still be using oil streetlamps so we don't put the lamplighters out of work. But at the same time I don't think we should automatically and carelessly accept the decimation of human meaning and dignity at the altar of shareholder value.
I'm not a reactionary luddite saying that we should still be using oil streetlamps so we don't put the lamplighters out of work.
No doubt. A few years ago there was some HN post with a video of the completely preposterous process of making diagrams for Crafting Interpreters. I didn't particularly need the book nor do I have room for it but I bought it there and then to support the spirit of all-consuming wankery. So I'm not here from Mitch & Murray & Dark Satanic Mills, Inc either. At the same time, I'm not sold on the idea niche art is the source of human dignity that needs societal protection, not because I'm some ogre but because I'm not convinced that's how actual art actually arts or provides meaning or evolves.
Like another Thomas put it
Not for the proud man apart
From the raging moon I write
On these spindrift pages
Nor for the towering dead
With their nightingales and psalms
But for the lovers, their arms
Round the griefs of the ages,
Who pay no praise or wages
Nor heed my craft or art.
> the spirit of all-consuming wankery.
Haha, a good way to describe it. :)
> the idea niche art is the source of human dignity that needs societal protection
I mean... have you looked around at the world today? We've got pick at least some sources of human dignity to protect because there seem to be fewer and fewer left.
Sitting in a moving car and sitting on a moving horse are both called "riding", but I think we can all appreciate how useless it is to equate the two.
They aren't, broadly speaking, interesting forms of expression so the fact you can draw some trivial string match analogy doesn't seem worth much discussion.
That was my point. The fact that we call people wearing CGI suits hopping around green room, and people on stage playing a character for a crowd, both acting doesn't account for the fact that doing one doesn't mean you can do the other.
> Did Photoshop kill graphic artists?
Desktop publication software killed many jobs. I worked for a publication where I had colleagues that used to typeset, place images, and use a camera to build pages by hand. That required a team of people. Once Quark Xpress and the like hit the scene, one person could do it all, faster.
In terms of illustration, the tools moved from pen and paper to Adobe Illustrator and Aldus / Macromedia Freehand. Which I'd argue was more of a sideways move. You still needed an illustrators skillset to use these tools.
The difference between what I just described and LLM image generation is the tooling changed to streamline an existing skillset. LLM's replace all of it. Just type something and here's your picture. No art / design skill necessary. Obviously, there's no guarantee that the LLM generated image will be any good. So, I'm not sure the Photoshop analogy works here.
I agree with the potential of AI. I use it daily for coding and other tasks. However, there are two fundamental issues that make this different from the Photoshop comparison.
The models are trained primarily on copyrighted material and code written by the very professionals who now must "upskill" to remain relevant. This raises complex questions about compensation and ownership that didn't exist with traditional tools. Even if current laws permit it, the ethical implications are different from Photoshop-like tools.
Previous innovations created new mediums and opportunities. Photoshop didn't replace artists, because it enabled new art forms. Film reduced theater jobs but created an entirely new industry where skills could mostly transfer. Manufacturing automation made products like cars accessible to everyone.
AI is fundamentally different. It's designed to produce identical output to human workers, just more cheaply and/or faster. Instead of creating new possibilities, it's primarily focused on substitution. Say AI could eliminate 20% of coding jobs and reduce wages by 30%:
The primary outcome appears to be increased profit margins rather than societal advancement. While previous technological revolutions created new industries and democratized access, AI seems focused on optimizing existing processes without providing comparable societal benefits.* Unlike previous innovations, this won't make software more accessible * Software already scales essentially for free (build once, used by many) * Most consumer software is already free (ad-supported)
This isn't an argument against progress, but we should be clear-eyed about how this transition differs from historical parallels, and why it might not repeat the same historical outcomes. I'm not claiming this will be the case, but that you can see some pretty significant differences for why you might be skeptical that the same creation of new jobs, or improvement to human lifestyle/capabilities will emerge as with say Film or Photoshop.
AI can also be used to achieve things we could not do without, that's the good use of AI, things like Cancer detection, self-driving cars, and so on. I'm speaking specifically of the use of AI to automate and reduce the cost/speed of white collar work like software development.
> The primary outcome appears to be increased profit margins rather than societal advancement. While previous technological revolutions created new industries and democratized access, AI seems focused on optimizing existing processes without providing comparable societal benefits.
This is the thing that worries me the most about AI.
The author's ramblings dovetails with this a bit in their "but the craft" section. They vaguely attack the idea of code-golfing and focusing on coding for the craft as essentially incompatible with the corporate model of programming work. And perhaps they're right. If they are, though, this AI wave/hype being mostly about process-streamlining and such seems to be a distillation of that fact.
For me this is the "issue" I have with AI. Unlike say the internet, mobile and other tech revolutions where I could see new use cases or existing use case optimisation spring up all the time (new apps, new ways of interacting, more efficient than physical systems, etc) AI seems to be focused more on efficiency/substitution of labour than pushing the frontier on "quality of life". Maybe this will change but the buzz is around job replacement atm.
Its why it is impacting so many people, but also having very small changes to everyday "quality of life" kind of metrics (e.g. ability to eat, communicate, live somewhere, etc). It arguably is more about enabling greater inequality and gatekeeping of wealth to capital - where intelligence and merit matters less in the future world. For most people its hard to see where the positives are for them long term in this story; most everyday folks don't believe the utopia story is in anyway probable.
Maybe it's like automation that makes webdev accessible to anyone. You take a week long AI coaching course and talk to an AI and let it throw together a website in an hour, then you self host it.
> Did Photoshop kill graphic artists?
No, but AI did.
In actual fact, photoshop did kill graphic arts. There was an entire industry filled with people who had highly-developed skillsets that suddenly became obsolete. Painters for example. Before photoshop, I had to go out of house to get artwork done; now I just do it myself.
No, it didn’t.
It changed the skill set but it didn’t “kill the graphic arts”
Rotoscoping in photoshop is rotoscoping. Superimposing an image on another in photoshop is the same as with film, it’s just faster and cheaper to try again. Digital painting is painting.
AI doesn’t require an artist to make “art”. It doesn’t require skill. It’s different than other tools
Even worse!!! What is consider art work now days are whatever that can be made on some vector based program. This really also stifles creativities, pigeonholing what is consider creative or art work into something can be used for machine learning.
Whatever can be replaced by AI will, cause it is easier for business people to deal with than real people.
Most of the vector art I see is minimalism. I can’t see this as anything but an argument that minimalism “stifles creativity”
> vector art pigeonholes art into something that can be used for machine learning
Look around, AI companies are doing just fine with raster art.
The only thing we agree on is that this will hurt workers
This, as the article makes clear, is a concern I am alert and receptive to. Ban production of anything visual from an LLM; I'll vote for it. Just make sure they can still generate Mermaid charts and Graphviz diagrams, so they still apply to developers.
What is unique about graphic design that warrants such extraordinary care? Should we just ban technology that approaches "replacement" territory? What about the people, real or imagined, that earn a living making Graphviz diagrams?
It’s more question of how it does what it does. By making statistical model out of work of humans that it now aims to replace.
I think graphic designers would be a lot less angry if AIs were trained on licensed work… thats how the system worked up until now after all.
I don't think most artists would be any less angry & scared if AI was trained on licensed work. The rhetoric would just shift from mostly "they're breashing copyright!" to more of the "machine art is soulless and lacks true human creativity!" line.
I have a lot of artist friends but I still appreciate that diffusion models are (and will be with further refinement) incredibly useful tools.
What we're seeing is just the commoditisation of an industry in the same way that we have many, many times before through the industrial era, etc.
It actually doesn't matter how would they feel. In currently accepted copyright framework if the works were licensed they couldn't do much about it. But right now they can be upset because suddenly new normal is massive copyright violation. It's very clear that without the massive amount of unlicensed work the LLMs simply wouldn't work well. The AI industry is just trying to run with it hoping nobody will notice.
It isn’t clear at all that there’s any infringement going on at all, except in cases where AI output reproduces copyrighted content or content that is sufficiently close to copyrighted content to constitute a derivative work. For example, if you told an LLM to write a Harry Potter fanfic, that would be infringement - fanfics are actually infringing derivative works that usually get a pass because nobody wants to sue their fanbase.
It’s very unlikely simply training an LLM on “unlicensed” work constitutes infringement. It could possibly be that the model itself, when published, would represent a derivative work, but it’s unlikely that most output would be unless specifically prompted to be.
I am not sure why you would think so. AFAIK we will see more what courts think later in 2025 but judging from what was ruled in Delaware in feb... it is actually very likely that LLMs use of material is not "fair use" because besides "how transformed" work is one important part of "fair use" is that the output does not compete with the initial work. LLMs not only compete... they are specifically sold as replacement of the work they have been trained on.
This is why all the lobby now pushes the govs to not allow any regulation of AI even if courts disagree.
IMHO what will happen anyway is that at some point the companies will "solve" the licensing by training models purely on older synthetic LLM output that will be "public research" (which of course will have the "human" weights but they will claim it doesnt matter).
What you are describing is the output of the LLM, not the model. Can you link to the case where a model itself was determined to be infringing?
It’s important that copyright applies to copying/publishing/distributing - you can do whatever you to copyrighted works by yourself.
I dont follow. The artists are obviously complaining about the output that LLMs create. If you create LLM and dont use it then yeah nobody would have problem with it because nobody would know about it…
In that case, public services can continue to try to fine tune outputs to not generate anything infringing. They can train on any material they want.
Of course, that still won’t make artists happy, because they think things like styles can be copyrighted, which isn’t true.
Any LLM output created with unlicensed sources is tainted. It doesnt matter if the output does not look like anything in the dataset. If you take out the unlicensed sources then you simply wont get the same result. An since the results directly compete with the source then its not “fair use”.
If we believe that authors should be able decide how their work is used then they can for sure say no machine learning. If we dont believe in intelectual property then anything is for grabs. I am ok with it but the corps are not.
That’s not how copyright law works, but it might be how it should work.
I'm interpreting what you described as a derivative work to be something like:
"Create a video of a girl running through a field in the style of Studio Ghibli."
There, someone has specifically prompted the AI to create something visually similar to X.
But would you still consider it a derivative work if you replaced the words "Studio Ghibli" with a few sentences describing their style that ultimately produces the same output?
Derivative work is a legal term. Art styles cannot be copyrighted.
I get where you're coming from, but given that LLMs are trained on every available written word regardless of license, there's no meaningful distinction. Companies training LLMs for programming and writing show the same disregard for copyright as they do for graphic design. Therefore, graphic designers aren't owed special consideration that the author is unwilling to extend to anybody else.
Of course i think the same about text, code, sound or any other LLMs output. The author is wrong if they are unwilling to give same measure to everything. The fact this is new normal now for everything does not make it right.
I like this argument, but it does somewhat apply to software development as well! The only real difference is that the bulk of the "licensed work" the LLMs are consuming to learn to generate code happened to use some open source license that didn't specifically exclude use of the code as training data for an AI.
For some of the free-er licenses this might mostly be just a lack-of-attribution issue, but in the case of some stronger licenses like GPL/AGPL, I'd argue that training a commercial AI codegen tool (which is then used to generate commercial closed-source code) on licensed code is against the spirit of the license, even if it's not against the letter of the license (probably mostly because the license authors didn't predict this future we live in).
FWIW Adobe makes a lot of noise about how their specific models were indeed trained on only licensed work. Not sure if that really matters however
Yes Adobe and Shutterstock/Getty might be in position to do this.
But there is a reason why nobody cares about Adobe AI and everybody uses midjourney…
The article discusses this.
Does it? It admits at the top that art is special for no given reason, then it claims that programmers don't care about copyright and they deserve what's coming to them, or something..
"Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists"
This feels asserted without any real evidence
LLMs immediately and completely displace the bread-and-butter replacement-tier illustration and design work that makes up much of that profession, and does so by effectively counterfeiting creative expression. An coding agent writes a SQL join or a tree traversal. The two things are not the same.
Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.
Finally, though I'm not stuck on this: I simply don't agree with the case being made for LLMs violating IPR.
I have had the pleasure, many times over the last 16 years, of expressing my discomfort with nerd piracy culture and the coercive might-makes-right arguments underpinning it. I know how the argument goes over here (like a lead balloon). You can agree with me or disagree. But I've earned my bona fides here. The search bar will avail.
>bread-and-butter replacement-tier
How is creative expression required for such things?
Also, I believe that we're just monkey meat bags and not magical beings and so the whole human creativity thing can easily be reproduced with enough data + a sprinkle of randomness. This is why you see trends in supposedly thought provoking art across many artists.
Artists draw from imagination which is drawn from lived experience and most humans have roughly the same lives on average, cultural/country barriers probably produce more of a difference.
Many of the flourishes any artist may use in their work is also likely used by many other artists.
If I commission "draw a mad scientist, use creative license" from several human artists I'm telling you now that they'll all mostly look the same.
> Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.
I think the case we are making is there is no such thing as intellectual property to begin with and the whole thing is a scam created by duck taping a bunch of different concepts together when they should not be grouped together at all.
That's exactly the point, it's hard to see how someone could hold that view and pillory AI companies for slurping up proprietary code.
You probably don't have those views. But I think Thomas' point is that the profession as a whole has been crying "information wants to be free" for so many years, when what they meant was "information I don't want to pay for wants to be free" - and the hostile response to AI training on private data underlines that.
> LLMs immediately and completely displace the bread-and-butter replacement-tier illustration and design work that makes up much of that profession, and does so by effectively counterfeiting creative expression. An coding agent writes a SQL join or a tree traversal. The two things are not the same.
In what way are these two not the same? It isn't like icons or ui panels are more original than the code that runs the app.
Or are you saying only artists are creating things of value and it is fine to steal all the work of programmers?
What about ones trained on fully licensed art, like Adobe Firefly (based on their own stock library) or F-Lite by Freepik & Fal (also claimed to be copyright safe)?
> LLMs immediately and completely displace the bread-and-butter replacement-tier illustration and design work that makes up much of that profession
And so what? Tell it to the Graphviz diagram creators, entry level Javascript programmers, horse carriage drivers, etc. What's special?
> .. and does so by effectively counterfeiting creative expression
What does this actually mean, though? ChatGPT isn't claiming to have "creative expression" in this sense. Everybody knows that it's generating an image using mathematics executed on a GPU. It's creating images. Like an LLM creates text. It creates artwork in the same sense that it creates novels.
> Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.
Programmers are very particular about licenses in opposition to your theory. Copyleft licensing leans heavily on enforcing copyright. Besides, I hear artists complain about the duration of copyright frequently. Pointing to some subset of programmers that are against IPR is just nutpicking in any case.
Oh, for sure. Programmers are very particular about licenses. For code.
I get it, you have an axe to grind against some subset of programmers who are "nerds" in a "piracy culture". Artists don't deserve special protections. It sucks for your family members, I really mean that, but they will have to adapt with everybody else.
I disagree with you on this. Artists, writers, and programmers deserve equal protection, and this means that tptacek is right to criticize nerd piracy culture. In other words, we programmers should respect artists and writers too.
To be clear, we're not in disagreement. We should all respect each other. However, it's pretty clear that the cat's out of the bag, and trying to claw back protections for only one group of people is stupid. It really betrays the author's own biases.
Doubt it is the same people. I doubt anyone argues that paintings deserve no protection while code does it.
I do have an axe to grind, and that part of the post is axe-grindy (though: it sincerely informs how I think about LLMs), I knew that going into it (unanimous feedback from reviewers!) and I own it.
I generally agree with your post. Many of the arguments against LLMs being thrown around are unserious, unsound, and a made-for-social-media circle jerk that don't survive any serious adversarial scrutiny.
That said, this particular argument you are advancing isn't getting so much heat here because of an unfriendly audience that just doesn't want to hear what you have to say. Or that is defensive because of hypocrisy and past copyright transgressions. It is being torn apart because this argument that artists deserve protection, but software engineers don't is unsound special pleading of the kind you criticize in your post.
Firstly, the idea that programmers are uniquely hypocritical about IPR is hyperbole unsupported by any evidence you've offered. It is little more than a vibe. As I recall, when Photoshop was sold with a perpetual license, it was widely pirated. By artists.
Secondly, the idea -- that you dance around but don't state outright -- that programmers should be singled out for punishment since "we" put others out of work is absurd and naive. "We" didn't do that. It isn't the capital owners over at Travelocity that are going to pay the price for LLM displacement of software engineers, it is the junior engineer making $140k/year with a mortgage.
Thirdly, if you don't buy into LLM usage as violating IPR, then what exactly is your argument against LLM use for the arts? Just a policy edict that thou shalt not use LLMs to create images because it puts some working artists out of business? Is there a threshold of job destruction that has to occur for you to think we should ban LLMs use case by use case? Are there any other outlaws/scarlet-letter-bearers in addition to programmers that will never receive any policy protection in this area because of real or perceived past transgressions?
Adobe is one of the most successful corporations in the history of commerce; the piracy technologists enabled wrecked most media industries.
Again, the argument I'm making regarding artists is that LLMs are counterfeiting human art. I don't accept the premise that structurally identical solutions in software counterfeit their originals.
> Adobe is one of the most successful corporations in the history of commerce; the piracy technologists enabled wrecked most media industries.
I guess that makes it ok then for artists to pirate Adobe's product. Also, I live in a music industry hub -- Nashville -- you'll have to forgive me if I don't take RIAA at their word that the music industry is in shambles, what with my lying eyes and all.
> Again, the argument I'm making regarding artists is that LLMs are counterfeiting human art. I don't accept the premise that structurally identical solutions in software counterfeit their originals.
I'm aware of the argument you are making. I imagine most of the people here understand the argument you are making. Its just a really asinine argument and is propped up by all manner of special pleading (but art is different, programmers are all naughty pirates that deserve to be punished) and appeals to authority (check my post history - I've established my bona fides.)
There simply is no serious argument to be made that LLMs reproducing one work product and displacing labor is better or worse than an LLM reproducing a different work product and displacing labor. Nobody is going to display some ad graphic from the local botanical garden's flyer for their spring gala at The Met. That's what is getting displaced by LLM. Banksy isn't being put out of business by stable diffusion. The person making the ad for the botanical garden's flyer has market value because they know how to draw things that people like to see in ads. A programmer has value because they know how to write software that a business is willing to pay for. It is as elitist as it is incoherent to say that one person's work product deserves to be protected but another person's does not because of "creativity."
Your argument holds no more water and deserves to be taken no more seriously than some knucklehead on Mastodon or Bluesky harping about how LLMs are going to cause global warming to triple and that no output LLMs produce has any value.
Well, I disagree with you. For the nth time, though, I also don't grant the premise that LLMs are violative of the IPR of programmers. But more importantly than anything else, I just don't want to hear any of this from developers. That's not "your arguments are wrong and I have refuted them". It's "I'm not going to hear them from you".
> For the nth time, though, I also don't grant the premise that LLMs are violative of the IPR of programmers.
I wish you all the best waiting for a future where the legislature and courts decide that LLM output is violative of copyright law only in the visual arts.
> I just don't want to hear any of this from developers.
Well, you seem to have posted about the wrong topic in the wrong forum then. But you’ve heard what you’ve wanted to hear in the discussion related to this post, so maybe that doesn’t really matter.
counterfeiting creative expression
This is the only piece of human work left in the long run, and that’s providing training data on taste. Once we hook up a/b testing on ai creative outputs, the LLM will know how to be creative and not just duplicative. The ai will never have innate taste, but we can feed it taste.
We can also starve it of taste, but that’s impossible because humans can’t stop providing data. In other words, never tell the LLM what looks good and it will never know. A human in the most isolated part of the world can discern what creation is beautiful and what is not.
Everything is derivative, even all human work. I don't think "creativity" is that hard to replicate, for humans it's about lived experience. For a model it would need the data that impacts its decisions. Atm models are trained for a neutral/overall result.
Your premise is an axiom that I don’t think most would accept.
Is the matrix a ripoff of the Truman show? Is Oldboy derivative of Oedipus?
Saying everything is derivative is reductive.
Modern flat graphic style has basically zero quality, I drew one myself even though I'm absolutely incompetent in proper drawing.
>This feels asserted without any real evidence
Things like this are expressions of preference. The discussion will typically devolve into restatements of the original preference and appeals to special circumstances.
Hasn't that ship sailed? How would any type of ban work when the user can just redirect the banned query to a model in a different jurisdiction, for example, Deepseek? I don't think this genie is going back into the bottle, we're going to have to learn to live with it.
Why not the same for texts? Why are shitty visual art more worth than the best texts from beloved authors? And what about cooking robots? Should we not protect the culinary arts?
> Ban production of anything visual from an LLM
That's a bit beside the point, which is that AI will not be just another tool, it will take ALL the jobs, one after another.
I do agree it's absolutely great though, and being against it is dumb, unless you want to actually ban it- which is impossible.
On the other hand it can revive dead artists. How about AI generated content going gpl in 100 days after release?
Well, this is only partially true. My optimistic take is that it will redefine the field. There is still a future for resourceful, attentive, and prepared graphic artists.
AI didn't kill creativity nor intuition. It much rather lack's those things completely. Artists can make use of AI but they can't make themselves obsolete just yet.
With AI anyone can be an artist, and this is a good thing.
Prompting Midjourney or ChatGPT to make an image does not make you an artist.
Using AI makes you an artist about as much as commissioning someone else to make art for you does. Sure, you provided the description of what needed to be done, and likely gave some input along the way, but the real work was done by someone else. There are faster iteration times with AI, but you are still not the one making the art. That is what differentiates generative models from other kinds of tools.
AI can’t make anyone a painter. It can generate a digital painting for you but it can’t give you the skills to transfer an image from your mind into the real world.
AI currently can’t reliably make 3d objects so AI can’t make you a sculptor.
We now have wall printers based on UV paint.
3D models can be generated quite well already. Good enough for a sculpture.
[dead]
> AI didn't kill creativity nor intuition. It much rather lack's those things completely
Quite the opposite, I'd say that it's what it has most. What are "hallucinations" if not just a display of immense creativity and intuition? "Here, I'll make up this API call that's I haven't read about anywhere but sounds right".
I disagree. AI is good at pattern recognition, but still struggles to grasp causual relationships. These Made-up api calls are just a pattern in the large data set. Dont confuse it with creativity.
I would definitely confuse that with "intuition"- which I would describe it as seeing and using weak, unstated relationships, aka patterns. That's my intuition, at least.
As to creativity, that's something I know too little about to define it, but it seems reasonable that it's even more "fuzzy" than intuition. On the opposite, causal relationships are closer to hard logic, which is what LLMs struggle with- as humans do, too.
A lot of art is about pattern recognition. We represent or infer objects or ideas through some indirection or abstraction. The viewer or listener's brain (depending on their level of sophistication) fills in the gaps, and the greater the level of indirection (or complexity of pattern recognition required) the greater the emotional payoff. This also applies to humour.
It will not.
I'm an engineer through and through. I can ask an LLM to generate images just fine, but for a given target audience for a certain purpose? I would have no clue. None what so ever. Ask me to generate an image to use in advertisement for Nuka Cola, targeting tired parents? I genuinely have no idea of where to even start. I have absolutely no understanding of the advertisement domain, and I don't know what tired parents find visually pleasing, or what they would "vibe" with.
My feeble attempts would be absolute trash compared to a professional artist who uses AI to express their vision. The artist would be able to prompt so much more effectively and correct the things that they know from experience will not work.
It's the exact same as with coding with an AI - it will be trash unless you understand the hows and the whys.
> Ask me to generate an image to use in advertisement for Nuka Cola, targeting tired parents? I genuinely have no idea of where to even start.
I believe you, did you try asking ChatGPT or Claude though?
You can ask them a list of highest-level themes and requirements and further refine from there.
Have you seen modern advertisements lmao? Most of the time the ad has nothing to do with the actual product, it's an absolute shitshow.
Although I've seen a little American TV ads before, that shit's basically radioactively coloured, same as your fizzy drinks.
The key is that manual coding for a normal task takes a one/two weeks, where-as if you configure all your prompts/agents correctly you could do it in a couple of hours. As you highlighted, it brings many new issues (code quality, lack of tests, tech debt) and you need to carefully create prompts and review the code to tackle those. But in the end, you can save significant time.
I disagree. I think this notion comes from the idea that creating software is about coding. Automating/improving coding => you have software at the end.
This might be how one looks at it in the beginning, when having no experience or no idea about coding. With time one will realize it's more about creating the correct mental model of the problem at hand, rather than the activity of coding itself.
Once this realized, AI can't "save" you days of work, as coding is the least time consuming part of creating software.
The actual most time-consuming parts of creating software (I think) is reading documentation for the APIs and libraries you're using. Probably the biggest productivity boost I get from my coding assistant is attributable to that.
e.g: MUI, typescript:
Tab. Done. Delete the comment.// make the checkbox label appear before the checkbox.
vs. about 2 minutes wading through the perfectly excellent but very verbose online documentation to find that I need to set the "labelPlacement" attribute to "start".
Or the tedious minutia that I am perfectly capable of doing, but it's time consuming and error-prone:
Tab tab tab tab .... Done, with all bindings and fields done, based on the structure that's passed as a parameter to the method, and the tables and fieldnames that were created in source code above the current line. (love that one).// execute a SQL update
Yes, I currently lean skeptical but agentic LLMs excel at this sort of task. I had a great use just yesterday.
I have an older Mediawiki install that's been overrun by spam. It's on a server I have root access on. With Claude, I was able to rapidly get some Python scripts that work against the wiki database directly and can clean spam in various ways, by article ID, title regex, certain other patterns. Then I wanted to delete all spam users - defined here as users registered after a certain date whose only edit is to their own user page - and Claude made a script for that very quickly. It even deployed with scp when I told it where to.
Looking at the SQL that ended up in the code, there's non-obvious things such as user pages being pages where page_namespace = 2. The query involves the user, page, actor and revision tables. I checked afterwards, MediaWiki has good documentation for its database tables. Sure, I could have written the SQL myself based on that documentation, but certainly not have the query wrapped in Python and ready to run in under a minute.
what are you using for this? one thing I can't wrap my head around is how anyone's idea of fun is poking at an LLM until it generates something possibly passable and then figuring what the hell it did and why, but this sounds like something i'd actually use.
Yes. Vscode/Copilot/Claude Sonnet 4. The choice of AI may make a significant difference. It used to. The GPT AIs, particularly, were useless. I haven't tried GPT 4.1 yet.
vscode?
vscode comes with that out of the box?
pretty much, the plugin is called Copilot. click a button and install it
that's a whole different piece of software that it doesn't come with out of the box lol.
Copilot was what i was looking for, thank you. I have it installed in Webstorm already but I haven't messed with this side of it.
idk i click 'copilot' and it adds it. it took maybe 2 minutes.
And if you don't add it, VSCode nags you incessantly until you do. :-P
I think for some folks their job really is just about coding. For me that was rarely true. I've written very little code in my career. Mostly design work, analyzing broken code, making targeted fixes...
I think these days coding is 20% of my job, maybe less. But HN is a diverse audience. You have the full range of web programmers and data scientists all the way to systems engineers and people writing for bare metal. Someone cranking out one-off Python and Javascript is going to have a different opinion on AI coding vs a C/C++ systems engineer and they're going to yell at each other in comments until they realize they don't have the same job, the same goals or the same experiences.
Would you have any standard prompts you could share which ask it to make a draft with you'd want (eg unit tests etc)?
``` Run it through grok:C++, Linux: write an audio processing loop for ALSA reading audio input, processing it, and then outputting audio on ALSA devices. Include code to open and close the ALSA devices. Wrap the code up in a class. Use Camelcase naming for C++ methods. Skip the explanations.
When I ACTUALLY wrote that code the first time, it took me about two weeks to get it right. (horrifying documentation set, with inadequate sample code).https://grok.com/
Typically, I'll edit code like this from top to bottom in order to get it to conform to my preferred coding idioms. And I will, of course, submit the code to the same sort of review that I would give my own first-cut code. And the way initialization parameters are passed in needs work. (A follow-on prompt would probably fix that). This is not a fire and forget sort of activity. Hard to say whether that code is right or not; but even if it's not, it would have saved me at least 12 days of effort.
Why did I choose that prompt? Because I have learned through use that AIs do will well with these sorts of coding tasks. I'm still learning, and making new discoveries every day. Today's discovery: it is SO easy to implement SQLLite database in C++ using an AI when you go at it the right way!
That rely heavily on your mental model of ALSA to write a prompt like that. For example, I believe macOS audio stack is node based like pipewire. For someone who is knowledgeable about the domain, it's easy enough to get some base output to review and iterate upon. Especially if there was enough training data or you constrain the output with the context. So there's no actual time saving because you have to take in account the time you spent learning about the domain.
That is why some people don't find AI that essential, if you have the knowledge, you already know how to find a specific part in the documentation to refresh your semantics and the time saved is minuscule.
Fer goodness sake. Eyeroll.
Run it through grok. I'd actually use VSCode Copilot Claude Sonnet 4. Grok is being used so that people who do not have access to a coding AI can see what they would get if they did.Write an audio processing loop for pipewire. Wrap the code up in a C++ class. Read audio data, process it and output through an output port. Skip the explanations. Use CamelCase names for methods. Bundle all the configuration options up into a single structure.
I'd use that code as a starting point despite having zero knowledge of pipewire. And probably fill in other bits using AI as the need arises. "Read the audio data, process it, output it" is hardly deep domain knowledge.
Results with gemini
A 5 second search on DDG ("easyeffects") and a 10 second navigation on github.
https://github.com/wwmm/easyeffects/blob/master/src/plugin_b...
But that is GPL 3.0 and a lot of people want to use the license laundering LLM machine.
N.B. I already know about easyeffects from when I was seeking for a software equalizer
EDIT
Another 30 seconds exploration ("pipewire" on DDG, finding the main site, then goes on the documentation page, and the tutorial section).
https://docs.pipewire.org/audio-dsp-filter_8c-example.html
There's a lot of way to find truthful information without playing Russian roulette with an LLM.
The question is can I self-host this "mech suit"? If not, I would much not use some API hosted by another party.
Saas just seems very much like a terminator seed situation in the end.
"Mech suit" is apt. Gonna use that now.
Having plenty of initial discussion and distilling that into requirements documents aimed for modularized components which can all be easily tackled separately is key.
This is my experience as well.
I’d add that Excel didn’t kill the engineering field. It made them more effective and maybe companies will need less of them. But it also means more startups and smaller shops can make use of an engineer. The change is hard and an equilibrium will be reached.
Yes! It needs and seems to want the human to be a deep collaborator. If you take that approach, it is actually a second senior developer you can work with. You need to push it, and explain the complexities in detail to get fuller rewards. And get it to document everything important it learns from each session's context. It wants to collaborate to make you a 10X coder, not to do your work for you while you laze. That is the biggest breakthrough I have found. They basically react like human brains, with the same kind of motives. Their output can vary dramatically based on the input you provide.
> Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.
This is how I use it mostly. I also use it for boilerplate, like "What would a database model look like that handles the following" you never want it to do everything, though there are tools that can and will and they're impressive, but then when you have a true production issue, your inability to quickly respond will be a barrier.
That’s all great news that if you know how to use an LLM, it works wonders for you. But LLMs are changing so fast, can it really be sustainable for me to “learn” it only for it to change and go backwards the next month? (I am thinking about how terrible Google became.)
I’m learning live how to use these things better, and I haven’t seen practical guides like:
- Split things into small files, today’s model harnesses struggle with massive files
- Write lots of tests. When the language model messes up the code (it will), it can use the tests to climb out. Tests are the best way to communicate behavior.
- Write guides and documentation for complex tasks in complex codebases. Use a language model for the first pass if you’re too lazy. Useful for both humans and LLMs
It’s really: make your codebase welcoming for junior engineers
> it can use the tests to climb out
Or not. I watched Copilot's agent mode get stuck in a loop for most of an hour (to be fair, I was letting it continue to see how it handles this failure case) trying to make a test pass.
Yeah! When that happens I usually stop it and tap in a bigger model to “think” and get out of the loop (or fix it myself)
I’m impressed with this latest generation of models: they reward hack a lot less. Previously they’d change a failing unit test, but now they just look for reasonable but easy ways out in the code.
I call it reward hacking, and laziness is not the right word, but “knowing what needs to be done and not doing it” is the general issue here. I see it in junior engineers occasionally, too.
i love your views and way to express it, spot on. i feel similar in some ways. i hated ai, loved ai, hated it again and love it again. i still feel the code i unusable for my main problems, but i realize better its my arrogance that causes it. i cant formulate solutions eloquently enough and blame the AI for bad code.
AI has helped me pick up my pencil and paper again and realize my flawed knowledge, skills, and even flawed approach to AI.
Now i instructed it to never give me code :). not because the code is bad, but my attempts to extract code from it are more based in laziness than efficiency. they are easy to confuse afterall ;(....
I have tons of fun learning with AI, exploring. going on adventures into new topics. Then when i want to really do something, i try to use it for the things i know i am bad at due to laziness, not lack of knowledge. the thing i fell for first...
it helps me explore a space, then i think or am inspired for some creation, and it helps me structure and plan. when i ask it from laziness to give me the code, it helps me overcome my laziness by explaining what i need to do to be able to see why asking for the code was the wrong approach in the first place.
now, that might be different for you. but i have learned i am not some god tier hacker from the spawl, so i realized i need to learn and get better. perhaps you are at the level you can ask it for code and it just works. hats off in that case ;k (i do hope you tested well!)
> Did film kill theatre?
Relatively speaking, I would say that film and TV did kill theater
I am pretty sure this comment is also AI generated. Just a guess but so many em-dash is suspicious. And the overall structure of convincing feels uncanny.
If this is true, can you share your initial draft that you asked the AI to rewrite. Am I not right that the initial draft is more concise and better conveys your actual thought, even though it's not as much convincing.
I agree, this article is basically what I've been thinking as I play with these things over time. They've gotten a ton better but the hot takes are still from 6-12 months ago.
One thing I wish he would have talked about though is maintenance. My only real qualm with my LLM agent buddy is the tendency to just keep adding code if the first pass didn't work. Eventually, it works, sometimes with my manual help. But the resulting code is harder to read and reason about, which makes maintenance and adding features or behavior changes harder. Until you're ready to just hand off the code to the LLM and not do your own changes to it, it's definitely something to keep in mind at minimum.
Photoshop etc are still just tools. They can’t beat us at what has always set us apart: thinking. LLM’s are the closest, and while they’re not close they’re directionally correct. They’re general purpose, not like chess engines. And they improve. It’s hard to predict a year out, never mind ten.
> Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”?
My obligatory comment how analogies are not good for arguments: there is already discussion here that film (etc.) may have killed theatre.
I think also the key is - don't call it AI, because it's not. It's LLM assist query parsing and code generation. Semantically, if you call it AI, the public expects a cognitive equivalent to a human which this is not, and from what @tptacek describes, is not meant to be - the reasoning and other code bits to create agents and such seem to be developed specifically for code generation and programming assist and other tasks thereof. Viewed in that lens, the article is correct - it is by all means a major step forward.
I agree but that battle is lost. Someone was calling Zapier workflows AI on X.
AGI vs AI is how to separate this these days.
The irony of the ChatGPT em dashes ;3
The entire comment feels way too long, structured and convincing in a way that can only be written by an AI. I just hope that once the em-dashes are "fixed", we still be able to detect such text. I fear for a future when human text is sparse, even here at HN. It is depressing to see such a comment take the top spot.
Lol -- it even reads with the same exact tone as AI. For those that use it often, it's so easy to spot now. The luddites on HN that fear AI feel end up affected the most because they have no idea how to see it.
I use LLMs daily. From helping me write technical reports (not 100%, mostly making things sound better after I have a first draft) to mapping APIs (documentation, etc).
I can only imagine what this technology will be like in 10 years. But I do know that it's not going anywhere and it's best to get familiar with it now.
- [deleted]
I treat AI as my digital partner in pair programming. I've learned how to give it specific and well-defined tasks to do, and it gets it done. The narrower the scope and more specific the task then the more successful you'll have.
there’s a sweet spot in there, it’s not “as narrow as possible” - the most productive thing is to assign the largest possible tasks that are just short of the limit where the agents become stupid. this is hard to hit, and a moving target!
Exactly. When you get a new tool or a new model, ask it for things the previous one failed at until you find the new ceiling.
Love all of this.
Most importantly, I'll embrace the change and hope for the possible abundance.
- [deleted]
LLM's are self-limiting, rather than self-reinforcing, and that's the big reason why they're not the thing, both good or bad, that some people think they are.
"Garbage in, garbage out", is still the rule for LLM's. If you don't spend billions training them or if you let them feed on their own tail too much they produce nonsense. e.g. Some LLM's currently produce better general search results than google. This is mainly a product of many billions being spent on expert trainers for those LLM's, while google neglects (or actively enshitifies) their search algorithms shamefully. It's humans, not LLM's, producing these results. How good will LLM's be at search once the money has moved somewhere else and neglect sets in?
LLM's aren't going to take everyone's jobs and trigger a singularity precisely because they fall apart if they try to feed on their own output. They need human input at every stage. They are going to take some people's jobs and create new ones for others, although it will probably be more of the former than the latter, or billionaires wouldn't be betting on them.
Yes, film killed theatre.
> Then I actually read the code.
This is my experience in general. People seem to be impressed by the LLM output until they actually comprehend it.
The fastest way to have someone break out of this illusion is tell them to chat with the LLM about their own expertise. They will quickly start to notice errors in the output.
You know who does that also? Humans. I read shitty, broken, amazing, useful code every day, but you don’t see my complaining online that people who earn 100-200k salary don’t produce ideal output right away. And believe me, I spend way more time fixing their shit than LLMs.
If I can reduce this even by 10% for 20 dollars it’s a bargain.
But no one is hyping the fact that Bob the mediocre coder is going to replace us.
what no one is reckoning with right here
The AI skeptics are mostly correctly reacting to the AI hypists, who are usually shitty linkedin influencer type dudes crowing about how they never have to pay anyone again. its very natural, even intelligent to not trust this now that its filling the same bubble as NFTs a few years ago. I think its okay to stay skeptical and see where the chips fall in a few years at this point.
But Bob isn’t getting better every 6 months
If you’re not getting any better you are indeed in trouble.
I’ve definitely improved at a faster rate than LLMs over the last 6 months
Let’s see the evals
https://the-decoder.com/openai-quietly-funded-independent-ma...
You mean these?
I use AI everyday but you’ve got hundreds of billions of dollars and Scam Altman (known for having no morals and playing dirty) et al on “your” side. The only thing AI skeptics have is anecdotes and time. Having a principled argument isn’t really possible.
[flagged]
Offshoring / nearshoring has been here for decades!
/0
[flagged]
If that was true why would we be told llms would replace us? Would us having already been replaced mean we weren’t working?
Also this is kind of racist.
Do you know what the word most means?
That has not been my experience at all with networking and cryptography.
Your comment is ambiguous; what exactly do you refer to by "that"?
- [deleted]
[flagged]
You put people in nice little drawers, the skeptics, and the non-skeptics. It is reductive and most of all, it’s polarizing. This is how US politics have become and we should avoid this here.
Yeah, putting labels on people is not very nice.
[flagged]
10 month old account talking like that to the village elder
In fairness, the article is a lot more condescending and insulting to its readers than the comment you're replying to.
A LLM is essentially the world information packed into a very compact format. It is the modern equivalent of the Library of Alexandria.
Claiming that your own knowledge is better than all the compressed consensus of the books of the universe, is very optimistic.
If you are not sure about the result given by a LLM, it is your task as a human to cross-verify the information. The exact same way that information in books is not 100% accurate, and that Google results are not always telling the truth.
>LLM is essentially the world information packed into a very compact format.
No, it's world information distilled to various parts and details that training deemed important. Do not pretend for one second that it's not an incredibly lossy compression method, which is why LLMs hallucinate constantly.
This is why training is only useful for teaching the LLM how to string words together to convey hard data. That hard data should always be retrieved via RAG with an independent model/code verifying that the contents of the response are correct as per the hard data. Even 4o hallucinates constantly if it doesn't do a web search and sometimes even when it does.
Well let's not forget that it's an opinionated source. There is also the point that if you ask it about a topic it will (often) give you the answer that has the most content about it (or easiest to access information).
Agree.
I find that, for many, LLMs are addictive, a magnet, because it offers to do your work for you, or so it appears. Resisting this temptation is impossibly hard for children for example, and many adults succumb.
A good way to maintain a healthy dose of skepticism about its output and keep on checking this output, is asking the LLM about something that happened after the training cut off.
For example, I asked if lidar could damage phone lenses. And the LLM very convincingly argued it was highly improbable. Because that recently made the news as a danger for phone lenses, and wasn’t part of the training data.
This helps me stay sane and resist the temptation of just accepting LLM output =)
On a side note, the kagi assistant is nice for kids I feel because it links to its sources.
LIDAR damaging the lens is extremely unlikely. A lens is mostly glass.
What it can damage is the sensor, which is actually not at all the same thing as a lens.
When asking questions it's important to ask the right question.
Sorry, I meant the sensor
I asked ChatGPT o3 if lidar could damage phone sensors and it said yes https://chatgpt.com/share/683ee007-7338-800e-a6a4-cebc293c46...
I also asked Gemini 2.5 pro preview and it said yes. https://g.co/gemini/share/0aeded9b8220
I find it interesting to always test for myself when someone suggests to me that a "LLM" failed at a task.
I should have been more specific, but you missed my point I believe.
I tested this at the time on Claude 3.7 sonnet, which have an earlier cut off date and I just tested again with this prompt: “Can the lidar of a self driving car damage a phone camera sensor?” and the answer is still wrong in my test.
I believe the issue is the training cut off date, that’s my point, LLM seem smart but they have limits and when asked about something discovered after training cut off date, they will sometimes confidently be wrong.
I didn't miss your point, rather I wanted you to realize some deeper points I was trying to make
- Not all LLM are the same, and not identifying your tool is problematic because "LLM's can't do a thing" is very different than "The particular model I used failed at this thing". I demonstrated that by showing that many LLMs get the answer right. It puts the onus of correctness entirely on the category of technology, and not the tool used or the skill of the tool user.
- Training data cutoffs are only one part of the equation: Tool use by LLM's allows them to search the internet and run arbitrary code (amongst many other things).
In both of my cases, the training data did not include the results either. Both used a tool call to search the internet for data.
Not realizing that modern AI tools are more than an LLM with training data, but rather have tool calling, full internet access, and can access and reason about a wide variety of up to date data sources demonstrates a fundamental misunderstanding about modern AI tools.
Having said that:
Claude Sonnet 4.0 says "yes": https://claude.ai/share/001e16f8-20ea-4941-a181-48311252bca0
Personally, I don't use Claude for this kind of thing because while it's proven to be a very good at being a coding assistant and interacting with my IDE in an "agentic" manner, it's clearly not designed to be a deep research assistant that broadly searches the internet and other data sources to provide accurate and up to date information. (This would mean that ai/model selection is a skill issue and getting good results from AI tools is a skill, which is borne out by the fact that I get the right answer every time I try, and you can't get the right answer once).
Still not getting it I think.
My point is: LLMs sound very plausible and very confident when they are wrong.
That’s it. And I was just offering a trick to help remembering this, to keep checking their output – nothing else.
- [deleted]
This is pre-Covid HN thread on work from home:
https://news.ycombinator.com/item?id=22221507
It’s eerie. It’s historical. These threads from these past two years about what the future of AI will be will read like ghost stories. Like Rose having flash backs of the Titanic. It’s worth documenting. We honestly could be having the most ominous discussion of what’s to come.
We sit around and complain about dips in hiring, that’s nothing. The iceberg just hit. We’ve got 6 hours left.
> We sit around and complain about dips in hiring, that’s nothing. The iceberg just hit. We’ve got 6 hours left.
At least we've got hacker news to ourselves, have we not ... https://news.ycombinator.com/item?id=44130743
Partially OT:
Yesterday I asked Chat GPT which was the Japanese Twin City for Venice (Italy). This was just a quick offhand question because I needed the answer for a post on IG, so not exactly a death or life situation.
Answer: Kagoshima. It also added that the "twin status" was officially set in 1965, and that Kagoshima was the starting point for the Jesuit Missionary Alessandro Valignano in his attempt to proselitize Japanese people (to Catholicism, and also about European Culture).
I never heard of Kagoshima, so I googled for it. And discovered it is the twin city of Neaples :/
So I then googled for "Venice Japanese Twin City" and got: Hiroshima. I doublechecked this then I went back to ChatGPT and wrote:
"Kagoshima is the Twin City for Neaples.".
This triggered a websearch and finally it wrote back:
"You are right, Kagoshima is Twin City of Neaples since 1960."
Then it added "Regarding Venice instead, the twin city is Hiroshima, since 2023".
So yeah, a Library of Alexandria that you can count on as long as you have another couple of libraries to doublechek whatever you get from it. Note also that this was very straightforward question, there is nothing to "analyze" or "interpret" or "reason about". And yet the answer was completely wrong, the first date was incorrect even for Neaples (actually the ceremony was in May 1960) and the extra bits about Alessandro Valignano are not reported anywhere else: Valignano was indeed a Jesuit and he visited Japan multiple times, but Kagoshima is never mentioned when you google for him or if you check his wikipedia page.
You may understand how I remain quite skeptical for any application which I consider "more important than an IG title".
Claude 4 Opus:
> Venice, Italy does not appear to have a Japanese twin city or sister city. While several Japanese cities have earned the nickname "Venice of Japan" for their canal systems or waterfront architecture, there is no formal sister city relationship between Venice and any Japanese city that I could find in the available information
I think GPT-4o got it wrong in your case because it searched Bing, and then read only fragments of the page ( https://en.wikipedia.org/wiki/List_of_twin_towns_and_sister_... ) to save costs for processing "large" context
I am Italian, and I have some interest in Japanese history/culture.
So when I saw a completely unknown city I googled it up because I was wondering what it actually had in common with Venice (I mean, a Japanese version of Venice would be a cool place to visit next time I go to Japan, no?).
If I wanted to know, I dunno, "What is the Chinese Twin City for Buenos Aires" (to mention two countries I do not really know much about, and do not plan to visit in the future) should I trust the answer? Or should I go looking it up somewhere else? Or maybe ask someone from Argentina?
My point is that even as a "digital equivalent of the Library of Alexandria" LLM seem to be extremely unreliable. Therefore - at least for now - I am wary about using them for work, or for any other area where I really care for the quality of the result.
If I want facts that I would expect the top 10 Google results to have, I turn search on. If I want a broader view of a well known area, I turn it off. Sometimes I do both and compare. I don’t rely on model training memory for facts that the internet wouldn’t have a lot of material for.
40 for quick. 40 plus search for facts. O4-mini high plus search for “mini deep research”, where it’ll hit more pages, structure and summarise.
And I still check the facts and sources to be honest. But it’s not valueless. I’ve searched an area for a year and then had deep research find things I haven’t.
o3 totally nailed it first shot. Hiroshima since 2023. Provides authoritative source (Venetian city press release): https://chatgpt.com/share/683f638a-3ce0-8005-91d6-3eb1df9f19...
What model?
People often say "I asked ChatGPT something and it was wrong", and then you ask them the model and they say "huh?"
The default model is 4.1o-mini, which is much worse than 4.1o and much much worse than o3 at many tasks.
Yup. The difference is particularly apparent with o3, which does bursts of web searches on its own whenever it feels it'll be helpful in solving a problem, and uses the results to inform its own next steps (as opposed to just picking out parts to quote in a reply).
(It works surprisingly well, and feels mid-way between Perplexity's search and OpenAI's Deep Research.)
I asked "What version/model are you running, atm" (I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE).
Answer: "gpt-4-turbo".
HTH.
Don't ask the model, just look at the model selection drop-down (wherever that may be in your UI)
>I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE
4.1o-mini definitely is not worth $20/month. o3 probably is (and is available in the $20/month plan) for many people.
No, don't think libraries, think "the Internet."
The Internet thinks all kinds of things that are not true.
Just like books then, except the internet can be updated
We all remember those teachers that said that internet cannot be trusted, and that only source of truth is in books.
Even if this were true (it is not; that’s not how LLMs work), well, there was a lot of complete nonsense in the Library of Alexandria.
It's a compressed statistical representation of text patterns, so it is absolutely true. You lose information during the process, but the quality is similar to the source data. Sometimes even above, as there is consensus when information is repeated across multiple sources.
It's amazing how it goes from all the knowledge in the world to ** terms and conditions apply, all answers are subject to market risks, please read the offer documents carefully.........
As someone who has followed Thomas' writing on HN for a long time... this is the funniest thing I've ever read here! You clearly have no idea about him at all.
Especially coming from you I appreciate that impulse, but I had the experience of running across someone else the Internet (or Bsky, at least) believed I had no business not knowing about, and I did not enjoy it, so I'm now an activist for the cause of "people don't need to know who I am". I should have written more clearly above.
That is a very good cause!
One would hope the experience leads to the position, and not vice-versa.
... you think tptacek has no expertise in cryptography?
That is no different from pretty any other person in the world. If I interview people to catch them on mistakes, I will be able to do exactly that. Sure, there are some exceptions, like if you were to interview Linus about Linux. Other than that, you'll always be able to find a fluke in someone's knowledge.
None of this makes me 'snap out' of anything. Accepting that LLM's aren't perfect means you can just keep that in mind. For me, they're still a knowledge multiplier and they allow me to be more productive in many areas of life.
Not at all. Useful or not, LLMs will almost never say "I don't know". They'll happily call a function to a library that never existed. They'll tell you "Incredible idea! You're on the correct path! And you can easily do that with so and so software", and you'll be like "wait what, that software doesn't do that", and they'll answer "Ah, yeah, you're right, of course."
TFA says, hallucinations is why "gyms" will be important: Language tooling (compiler, linter, language server, domain-specific static analyses etc) that feed back into the Agent, so it'll know to redo.
Sometimes asking in a loop: "are you sure ? think step-by-step", "are you sure ? think step-by-step", "are you sure ? think step-by-step", "are you sure ? think step-by-step", "verify the result" or similar, you may end up with "I'm sure yes", and then you know you have a quality answer.
No there are many techniques now to curb hallucinations. Not perfect but no longer so egregiously overconfident.
…such as?
The most infuriating are the emojis everywhere
That proves nothing with respect to the LLMs usefulness, all it means is that you are still useful.
- [deleted]
[flagged]
[flagged]
- [deleted]
- [deleted]
And in the meantime, the people you are competing with in the job market have already become 2x more productive.
Oh no! I'll get left behind! I can't miss out! I need to pay $100/month to an AI company or I'll be out of a job!
Hype hype hype! Data please.
What kind of "data" do you want? I'm submitting twice as many PRs with twice as much complexity. Changes that would take weeks take minutes. I generated 50 new endpoints for our API in a week. Last time we added 20 it took a month.
What kind of data would you want?
I'm 100x as productive now that I eat breakfast while reciting the declaration of independence.
You really should try it. Don't give up, it works if you do it just right.
Edit: I realize this line of argument doesn't really lead anywhere. I leave you with this Carl Sagan quote:
> Extraordinary claims require extraordinary evidence
(or perhaps we can just agree that claims require evidence, and not the anecdotal kind)
This is an immature bad faith interpretation.
The thing is there are loads of people making these claims and they aren't that extraordinary.
The claims for coding productivity increases were extraordinary 12 months ago, now they are fully realized and in use every day.
Please refrain from ad hominem attacks.
I'm not sure which part of my comment used an aspect of you to undermine the credibility of your argument.
So many people get ad hominem wrong.
I'm intrigued that someone who made a bad faith comparison like you would be so aghast at an ad hominem.
That's not what ad hominem means
Calling someone "immature" can be considered an ad hominem attack if it is used as a way to dismiss or undermine their argument without addressing the substance of what they are saying.
> Calling someone "immature"
That didn't actually happen in this thread, though. A comment was characterized as "immature". That might be insulting, but it's not an ad hominem attack.
Saying "this argument sucks" just isn't attacking an argument by attacking the credibility of the arguer. Using terms that can describe cognitive temperament or ability to directly characterize an argument might seem more ambiguous, but they don't really qualify either, for two reasons:
1. Humans are fallible, so it doesn't take an immature person to make an immature argument; all people are capable of becoming impatient or flippant. (The same is true for characterizations of arguments in terms of intelligence or wisdom, whatever.)
2. More crucially, even if you do read it as implying a personal insult, a direct assertion that an argument, claim, or comment is <insert insulting term here> is inverted from that of an ad hominem attack: the argument itself is asserted to evince the negative trait— rather than the negative trait, having been purportedly established about the person, being used to undermine the argument without any connection asserted between it and the argument other than that both it and the negative trait belong to the arguer.
You might call this, if you think the person saying it fails to show or suggest why the comment in question is "immature" or whatever a baseless insult, but it's not an ad hominem attack because that's just a different rhetorical strategy.
But it's pretty clear that it's more a form of scolding (and maybe even pleading with someone to "step up" in some sense) than something primarily intended to insult. Maybe that's useless, maybe it's needlessly insulting, maybe it involves an unargued assertion, but "ad hominem" just ain't what it is.
--
Fwiw, after writing all that, I kinda regret it even though I stand by the contents. Talk of fallacies here has the problems of being technical and impersonal, and perhaps invites tangents like this.
I said your interpretation was immature, not you.
This is not a mature thing to say: " I eat breakfast while reciting the declaration of independence."
Unless you’re being paid on a fixed bid it doesn’t matter. Someone else is reaping the rewards of your productivity not you. You’re being paid the same as your half-as-productive coworkers.
I'm a manager of software engineers. I have absolutely no intention of paying engineer B the same as half-as-productive engineer A.
Figuring out the best way to determine productivity is still a hard problem, but I think it's a category error to think that productivity gains go exclusively to the company.
If all (or even most) of the engineers on your team who were previously your equal become durably twice as productive and your productivity remains unchanged, your income prospects will go down, quite possibly to zero.
I think it matters very much if the person next to me is 2x as productive and I'm not.
Isn't this the case regardless of whether you use AI or not? Is this an argument for being less productive?
[flagged]
Really? I feel like the article pointedly skirted my biggest complaint.
> ## but the code is shitty, like that of a junior developer
> Does an intern cost $20/month? Because that’s what Cursor.ai costs.
> Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a skill and an engineering project all its own, of prompts, indices, and (especially) tooling. LLMs only produce shitty code if you let them.
I hate pair-programming with junior devs. I hate it. I want to take the keyboard away from them and do it all myself, but I can't, or they'll never learn.
Why would I want a tool that replicates that experience without the benefit of actually helping anyone?
You are helping the companies train better LLMs... Both by just paying for their expenses, but also they will use the training data. That may or may not be something one considers a worthwhile contribution. Certainly it is less valuable than helping a person grow their intellectual capacity.
> Does an intern cost $20/month? Because that’s what Cursor.ai costs.
This stuck out to me. How long will it continue to be so cheap? I would assume some of the low cost is subsidized by VC money which will dry up eventually. Am I wrong here?
Prices have been dropping like a stone over the past two years, due to a combination of new efficiencies being developed for serving plus competition from many vendors: https://simonwillison.net/2024/Dec/31/llms-in-2024/#llm-pric...
I'm not seeing any evidence yet of that trend stopping or reversing.
Training frontier models is expensive. Running inference on them is pretty cheap and for solving the same level of problem, will continue to get cheaper. The more use they get out of a model, the less overhead we need to pay on that inference to subsidize the training.
At the rate they're going it'll just get cheaper. The cost per token continues to drop while the models get better. Hardware is also getting more specialized.
Maybe the current batch of startups will run out of money but the technology itself should only get cheaper.
The article provides no solid evidence that "AI is working" for the author.
At the end of the day this article is nothing but another piece of conjecture on hackernews.
Actually assessing the usefulness of AI would require measurements and controls. Nothing has been proven or disproven here
the irony with AI sceptics is that their opinions usually sound like they've been stolen from someone else