This article uses the computational complexity hammer way too hard, discounts huge progress in every field of AI outside of the hot trend of transformers and LLMs. Nobody is saying the future of AI is autoregressive and this article pretty much ignores any of the research that has been posted here around diffusion based text generation or how it can be combined with autoregressive methods… discounts multi-modal models entirely. He also pretty much discounts everything that’s happened with AlphaFold, Alpha Go etc. reinforcement learning etc.
The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why. Is the brain NP complete? Maybe maybe not. I could see many arguments about why modern research will fail to create AGI but just hand waving “reality is NP-hard” is not enough.
The fact is: something fundamental has changed that enables a computer to pretty effectively understand natural language. That’s a discovery on the scale of the internet or google search and shouldn’t be discounted… and usage proves it. In 2 years there is a platform with billions of users. On top of that huge fields of new research are making leaps and bounds with novel methods utilizing AI for chemistry, computational geometry, biology etc.
It’s a paradigm shift.
I agree with everything you wrote, the technology is unbelievable and 6 years ago, maybe even 3.1 years would have been considered magic.
A steel man argument for why winter might be coming is all the dumb stuff companies are pushing AI for. On one hand (and I believe this) we argue it’s the most consequential technology in generations. On the other, everybody is using it for nonsense like helping you write an email that makes you sound like an empty suit, or providing a summary you didn’t ask for.
There’s still a ton of product work to cross whatever that valleys called between concept and product, and if that doesn’t happen, money is going to start disappearing. The valuation isn’t justified by the dumb stuff we do with it, it needs PMF.
> maybe even 3.1 years would have been considered magic
I vividly remember sitting there for literally hours talking to a computer on launch day, it was a very short night. This feeling of living in the future has not left me since, it's got quieter, but it's still there. It's still magic after those three years, perhaps even more so. It wasn't supposed to work this well for decades! and yet.
I think you're missing the point of "AI winter". It's not about how good the products are now. It's about how quickly the products are improving and creating the potential for profit. That's what drives investment.
3 things we know about the AI revolution in 2025:
- LLMs are amazing, but they have reached a plateau. AGI is not within reach.
- LLM investment has sacrificed many hundreds of billions of dollars, much of it from the world's pension funds.
- There is no credible path to a high-margin LLM product. Margins will be razor-thin positive at best once the free trial of the century starts to run out of steam.
This all adds up to a rather nasty crunch.
The thing about winter, though, is that it's eventually followed by another summer.
Many technologies plateau, but we don't say they all have winters. Terrestrial radio winter? Television winter? Automobile winter? Air travel winter? Nuclear power comes close in terms of its tarnished image and reluctance to reinvest.
I personally believe contemporary AI is over-hyped, but I cannot say with confidence that it is going to lead to a similar winter as the last time. It seems like today's products satisfy enough users to remain as a significant area, even if it doesn't greatly expand...
The only way I could see it fizzling as a product category is if it turns out it is not economically feasible to operate a sustainable service. Will users pay a fair price to keep the LLM datacenters running, without speculative investment subsidies?
The other aspect of the winter is government investment, rather than commercial. What could the next cycle of academic AI research look like? E.g. exploration that needs to happen in grant-funded university labs instead of venture-funded companies?
The federal funding picture seems unclear, but that's true across the board right now for reasons that have nothing to do with AI per se.
I think of LLMs like brains or CPUs. They're the core that does the processing, but needs to be embedded in a bigger system to be useful. Even if LLMs plateau, there will be a lot of development and improvement in the systems that use these LLM. We will be seeing a lot of innovation going forward, especially in systems that will be able to monetize these LLMs.
[flagged]
> I agree with everything you wrote, the technology is unbelievable and 6 years ago, maybe even 3.1 years would have been considered magic.
People said the same thing about ELIZA in 1967.
> The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why.
OP says it is because that predicting the next token can be correct or not, but it always looks plausible because that is what it calculates. Therefore it is dangerous and can not be fixed because it is how it works in essence.
I just want to point out a random anecdote.
Literally yesterday ChatGPT hallucinated an entire feature of a mod for a video game I am playing including making up a fake console command.
It just straight up doesn’t exist, it just seemed like a relatively plausible thing to exist.
This is still happening. It never stopped happening. I don’t even see a real slowdown in how often it happens.
It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
Another anecdote. I've got a personal benchmark that I try out on these systems every time there's a new release. It is an academic math question which could be understood by an undergraduate, and which seems easy enough to solve if I were just to hammer it out over a few weeks. My prompt includes a big list of mistakes it is likely to fall into and which it should avoid. The models haven't ever made any useful progress on this question. They usually spin their wheels for a while and then output one of the errors I said to avoid.
My hit/miss rate with using these models for academic questions is low, but non-trivial. I've definitely learned new math because of using them, but it's really just an indulgence because they make stuff up so frequently.
I get generally good results from prompts asking for something I know definitely exists or is definitely possible, like an ffmpeg command I know I’ve used in the past but can’t remember. Recently I asked how to something in Imagemagick which I’d not done before but felt like the kind of thing Imagemagick should be able to do. It made up a feature that doesn’t exist.
Maybe I should have asked it to write a patch that implements that feature.
To take a different perspective on the same event.
The model expected a feature to exist because it fitted with the overall structure of the interface.
This in itself can be a valuable form of feedback. I currently don't know of any people doing it, but testing interfaces by getting LLMs to use them could be an excellent resource. Th the AI runs into trouble, it might be worth checking your designs to see if you have any inconsistencies, redundancies or other confusion causing issues.
One would assume that a consistent user interface would be easier for both AI and humas. Fixing the issues would improve it for both.
That failure could be leveraged into an automated process that identified areas to improve.
When asking question I use chatgpt only as turbo search engine. Having it double check it's sources and citations helped tremendously.
There is no difference between "hallucination" and "soberness", it's just a database you can't trust.
The response to your query might not be what you needed, similar to interacting with an RDBMS and mistyping a table name and getting data from another table or misremembering which tables exist and getting an error. We would not call such faults "hallucinations", and shouldn't when the database is a pile of eldritch vectors either. If we persist in doing so we'll teach other people to develop dangerous and absurd expectations.
No it's absolutely not. One of these is a generative stochastic process that has no guarantee at all that it will produce correct data, and in fact you can make the OPPOSITE guarantee, you are guaranteed to sometimes get incorrect data. The other is a deterministic process of data access. I could perhaps only agree with you in the sense that such faults are not uniquely hallucinatory, all outputs from an LLM are.
I don't agree with these theoretical boundaries you provide. Any database can appear to lack in determinism, because data might get deleted, corrupted or mutated. Hardware and software involved might fail intermittently.
The illusion of determinism in RDBMS systems is just that, an illusion. The reason why I used the examples of failures in interacting with such systems that I did is that most experienced developers are familiar with those situations and can relate to them, while the probability for the reader to having experienced a truer apparent indeterminism is lower.
LLM:s can provide an illusion of determinism as well, some are quite capable of repeating themselves, e.g. overfitting, intentional or otherwise.
Yep. All these do is “hallucinate”. It’s hard to work those out of the system because that’s the entire thing it does. Sometimes the hallucinations just happen to be useful.
This seems unnecessarily pedantic. We know how the system works, we just use "hallucination" colloquially when the system produces wrong output.
Other people do not, hence the danger and the responsibility of not giving them the wrong impression of what they're dealing with.
Sorry, I'm failing to see the danger of this choice of language? People who aren't really technical don't care about these nuances. It's not going to sway their opinion one way or another.
If the information it gives is wrong, but is grammatically correct, then the "AI" has fulfilled its purpose. So it isn't really "wrong output" because that is what the system was designed to do. The problem is when people use "AI" and expect it will produce truthful responses - it was never designed to do that.
You are preaching to the choir.
But the point is that everyone uses the phrase "hallucinations" and language is just how people use it. In this forum at least, I expect everyone to understand that it is simply the result of next token generation and not an edge case failure mode.
"Eldritch vectors" is a perfect descriptor, thank you.
I like asking it about my great great grandparents (without mentioning they were my great great grandparents just saying their names, jobs, places of birth).
It hallucinates whole lives out of nothing but stereotypes.
> It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
This is actually very profound. All free models are only reasonable if they scrape 100 web pages (according to their own output) before answering. Even then they usually have multiple errors in their output.
[flagged]
Responding with "skill issue" in a discussion is itself a skill issue. Maybe invest in some conversational skills and learn to be constructive rather than parroting a useless meme.
First of all, there is no such thing as "prompt engineering". Engineering, by definition, is a matter of applying scientific principles to solve practical problems. There are no clear scientific principles here. Writing better prompts is more a matter of heuristics, intuition, and empiricism. And there's nothing wrong with that — it can generate a lot of business value — but don't presume to call it engineering.
Writing better prompts can reduce the frequency of hallucinations but frequent hallucinations still occur even with the latest frontier LLMs regardless of prompt quality.
So you are saying the acceptable customer experience for these systems is that we need to explicitly tell them to accept defeat when they can’t find any training content/web search results that matches my query enough?
Why don't they have any concept of having a percentage of confidence in their answer?
It isn’t 2022 anymore, this is supposed to be a mature product.
Why am I even using this thing rather than using the game’s own mod database search tool? Or the wiki documentation?
What value is this system adding for me if I’m supposed to be a prompt engineer?
> What value is this system adding....
[dead]
is this supposed to be some kind of mic drop?
Technologically, I believe that you're right. On the other hands, the previous AI winters happened despite novel, useful technologies, some of which proved extremely useful and actually changed the world of software. They happened because of overhype, then investor moving on to the next opportunity.
Here, the investors are investing in LLMs. Not in AlphaFold, AlphaGo, neurosymbolic, focus learning, etc. If (when) LLMs prove insufficient to the insane level of hype and if (when) experience shows that there is only so much money that you can make with LLMs, it's possible that the money will move on to other types of AI, but there are chances that it will actually go to something entirely different, perhaps quantum, leaving AI in winter.
> that enables a computer to pretty effectively understand natural language
I'd argue that it pretty effectively mimics natural language. I don't think it really understands anything, it is just the best madlibs generator that the world has ever seen.
For many tasks, this is accurate 99+% of the time, and the failure cases may not matter. Most humans don't perform any better, and arguably regurgitate words without understanding as well.
But if the failure cases matter, then there is no actual understanding and the language the model is generating isn't ever getting "marked to market/reality" because there's no mental world model to check against. That isn't going to be usable if there are real-world consequences of the LLM getting things wrong, and they can wind up making very basic mistakes that humans wouldn't make--because we can innately understand how the world works and aren't always just stringing words together that sound good.
I don't think anybody expects AI development to stop. A winter is defined by a relative drying-up of investment and, importantly, it's almost certain that any winter will eventually be followed by another summer.
The pace of investment in the last 2 years has been so insane that even Altman has claimed that it's a bubble.
> I could see many arguments about why modern research will fail to create AGI
Why is AGI even necessary? If the loop between teaching the AI something, and it being able to repeat similar enough tasks; if that loop becomes short enough, days or hours instead of months, who cares if some ill-defined bar of AGI is met?
> something fundamental has changed that enables a computer to pretty effectively understand natural language.
You understand how the tech works right? It's statistics and tokens. The computer understands nothing. Creating "understanding" would be a breakthrough.
Edit: I wasn't trying to be a jerk. I sincerely wasn't. I don't "understand" how LLMs "understand" anything. I'd be super pumped to learn that bit. I don't have an agenda.
It astonishes me how people can make categorical judgements on things as hard to define as 'understanding'.
I would say that, except for the observable and testable performance, what else can you say about understanding?
It is a fact that LLMs are getting better at many tasks. From their performance, they seem to have an understanding of say python.
The mechanistic way this understanding arises is different than humans.
How can you say then it is 'not real', without invoking the hard problem of consciousness, at which point, we've hit a completely open question.
- [deleted]
To be fair, it can be hard to define “chair” to the satisfaction of an unsympathetic judge.
“Do chairs exist?”:
I think it is fair to say that AIs do not yet "understand" what they say or what we ask them.
When I ask it to use a specific MCP to complete a certain task, and it proceeds to not use that MCP, this indicates a clear lack of understanding.
You might say that the fault was mine, that I didn't setup or initialize the MCP tool properly, but wouldn't an understanding AI recognize that it didn't have access to the MCP and tell me that it cannot satisfy my request, rather than blindly carrying on without it?
LLMs consistently prove that they lack the ability to evaluate statements for truth. They lack, as well, an awareness of their unknowing, because they are not trying to understand; their job is to generate (to hallucinate).
It astonishes me that people can be so blind to this weakness of the tool. And when we raise concerns, people always say
"How can you define what 'thinking' is?" "How can you define 'understanding'?"
These philosophical questions are missing the point. When we say it doesn't "understand", we mean that it doesn't do what we ask. It isn't reliable. It isn't as useful to us as perhaps it has been to you.
- [deleted]
As someone who was an engineer on the original Copilot team, yes I understand how tech works.
You don’t know how your own mind “understands” something. No one on the planet can even describe how human understanding works.
Yes, LLMs are vast statistical engines but that doesn’t mean something interesting isn’t going on.
At this point I’d argue that humans “hallucinate” and/or provide wrong answers far more often than SOTA LLMs.
I expect to see responses like yours on Reddit, not HN.
Before one may begin to understand something one must first be able to estimate the level of certainty. Our robot friends, while really helpful and polite, seem to be lacking in that department. They actually think the things we've written on the internet, in books, academic papers, court documents, newspapers, etc are actually true. Where the humans aren't omniscient it fills the blanks with nonsense.
[dead]
> As someone who was an engineer on the original Copilot team
Right, so "as someone who is a sociopath completely devoid of ethics" you were one of the cogs in the machine who said "fuck your license, we're training our llm on your code whether you like it or otherwise".
> that doesn’t mean something interesting isn’t going on
Wow! Such double negative. Much science. So rigor.
> At this point I’d argue that humans “hallucinate” and/or provide wrong answers far more often than SOTA LLMs.
Yikes, may the Corpo Fascist Gods protect any pitiable humans still in your life.
> I expect to see responses like yours on Reddit, not HN.
I suppose that says something about both of us.
[dead]
We could use a little more kindness in discussion. I think the commenter has a very solid understanding on how computer works. The “understanding” is somewhat complex but I do agree with you that we are not there yet. I do think that the paradigm shift though is more about the fact that now we can interact with the computer in a new way.
You understand how the brain works right? It's probability distributions mapped to sodium ion channels. The human understands nothing.
I've heard that this human brain is rigged to find what it wants to find.
Thats how the brain works, not how the mind works. We understand the hardware, not the software.
Are we even sure we understand the hardware? My understanding is even that is contested, for example orchestrated objective reduction, holonomic brain theory or GVF theory.
The end effect certainly gives off "understanding" vibe. Even if method of achieving it is different. The commenter obviously didn't mean the way human brain understands
Birds and planes operate using somewhat different mechanics, but they do both achieve flight.
Birds and planes are very similar other than the propulsion and landing gear, and construction materials. Maybe bird vs helicopter, or bird vs rocket.
"I don't "understand" how LLMs "understand" anything."
Why does the LLM need to understand anything. What today's chatbots have achieved is a software engineering feat. They have taken a stateless token generation machine that has compressed the entire internet's vocabulary to predict the next token and have 'hacked' a whole state management machinery around it. End result is a product that just feels like another human conversing with you and remembering your last birthday.
Engineering will surely get better and while purists can argue that a new research perspective is needed, the current growth trajectory of chatbots, agents and code generation tools will carry the torch forward for years to come.
If you ask me, this new AI winter will thaw in the atmosphere even before it settles on the ground.
Every time I see comments like these I think about this research from anthropic: https://www.anthropic.com/research/mapping-mind-language-mod...
LLMs activate similar neurons for similar concepts not only across languages, but also across input types. I’d like to know if you’d consider that as a good representation of “understanding” and if not, how would you define it?
Anthropic is pretty notorious for peddling hype. This is a marketing article - it has not undergone peer-review and should not be mistaken for scientific research.
it has a proper paper attached right at the beginning of the article
It’s not peer-reviewed, and was never accepted by a scientific journal. It’s a marketing paper masquerading as science.
If i could understand what the brain scans actually meant, I would consider it a good representation. I don't think we know yet what they mean. I saw some headline the other day about a person with "low brain activity" and said person was in complete denial about it, I would be too.
As I said then, and probably echoing what other commenters are saying - what do you mean by understanding when you say computers understand nothing? do humans understand anything? if so, how?
“You understand how the brain works right? It’s neurons and electrical charges. The brain understands nothing.”
I’m always struck by how confidently people assert stuff like this, as if the fact that we can easily comprehend the low-level structure somehow invalidates the reality of the higher-level structures. As if we know concretely that the human mind is something other than emergent complexity arising from simpler mechanics.
I’m not necessarily saying these machines are “thinking”. I wish I could say for sure that they’re not, but that would be dishonest: I feel like they aren’t thinking, but I have no evidence to back that up, and I haven’t seen non-self-referential evidence from anyone else.
- [deleted]
You don't understand how the tech works, then.
LLMs aren't as good as humans at understanding, but it's not just statistics. The stochastic parrot meme is wrong. The networks create symbolic representations in training, with huge multidimensional correlations between patterns in the data, whether its temporal or semantic. The models "understand" concepts like emotions, text, physics, arbitrary social rules and phenomena, and anything else present in the data and context in the same fundamental way that humans do it. We're just better, with representations a few orders of magnitude higher resolution, much wider redundancy, and multi-million node parallelism with asynchronous operation that silicon can't quite match yet.
In some cases, AI is superhuman, and uses better constructs than humans are capable of, in other cases, it uses hacks and shortcuts in representations, mimics where it falls short, and in some cases fails entirely, and has a suite of failure modes that aren't anywhere in the human taxonomy of operation.
LLMs and AI aren't identical to human cognition, but there's a hell of a lot of overlap, and the stochastic parrot "ItS jUsT sTaTiStIcS!11!!" meme should be regarded as an embarrassing opinion to hold.
"Thinking" models that cycle context and systems of problem solving also don't do it the same way humans think, but overlap in some of the important pieces of how we operate. We are many orders of magnitude beyond old ALICE bots and MEgaHAL markov chains - you'd need computers the size of solar systems to run a markov chain equivalent to the effective equivalent 40B LLM, let alone one of the frontier models, and those performance gains are objectively within the domain of "intelligence." We're pushing the theory and practice of AI and ML squarely into the domain of architectures and behaviors that qualify biological intelligence, and the state of the art models clearly demonstrate their capabilities accordingly.
For any definition of understanding you care to lay down, there's significant overlap between the way human brains do it and the way LLMs do it. LLMs are specifically designed to model constructs from data, and to model the systems that produce the data they're trained on, and the data they model comes from humans and human processes.
You appear to be a proper alchemist, but you can't support an argument of understanding if there is no definition of understanding that isn't circular. If you want to believe the friendly voice really understands you, we have a word for that, faith. The skeptic sees the interactions with a chatbot as a statistical game that shows how uninteresting (e.g. predictable) humans and our stupid language are. There are useful gimmicks coming out like natural language processing, for low risk applications, but this form of AI pseudoscience isn't going to survive, but it will take some time for research to catch up to understanding how to describe the falsehoods of contemporary AI toys
Understanding is the thing that happens when your neurons coalesce into a network of signaling and processing such that it empowers successful prediction of what happens next. This powers things like extrapolation, filling in missing parts of perceived patterns, temporal projection, and modeling hidden variables.
Understanding is the construction of a valid model. In biological brains, it's a vast parallelized network columns and neuron clusters in coordinated asynchronous operation, orchestrated to ingest millions of data points both internal and external, which result in a complex and sophisticated construct comprising the entirety of our subjective experience.
LLMs don't have the subjective experience module, explicitly. They're able to emulate the bits that are relevant to being good at predicting things, so it's possible that every individual token inference process produces a novel "flash" of subjective experience, but absent the explicit construct and a persistent and coherent self construct, it's not mapping the understanding to the larger context of its understanding of its self in the same way humans do it. The only place where the algorithmic qualities needed for subjective experience reside in LLMs is the test-time process slice, and because the weights themselves are unchanged in relation to any novel understanding which arises, there's no imprint left behind by the sensory stream (text, image, audio, etc.) Absent the imprint mechanism, there's no possibility to perpetuate the construct we think of as conscious experience, so for LLMs, there can never be more than individual flashes of subjectivity, and those would be limited to very low resolution correlations a degree or more of separation away from the direct experience of any sensory inputs, whereas in humans the streams are tightly coupled to processing, update in real-time, and persist through the lifetime of the mind.
The pieces being modeled are the ones that are useful. The utility of consciousness has been underexplored; it's possible that it might be useful in coordination and orchestration of the bits and pieces of "minds" that are needed to operate intelligently over arbitrarily long horizon planning, abstract generalization out of distribution, intuitive leaps between domains that only relate across multiple degrees of separation between abstract principles, and so on. It could be that consciousness will arise as an epiphenomenological outcome from the successful linking together of systems that solve the problems LLMs currently face, and the things which overcome the jagged capabilities differential are the things that make persons out of human minds.
It might also be possible to orchestrate and coordinate those capabilities without bringing a new mind along for the ride, which would be ideal. It's probably very important that we figure out what the case is, and not carelessly summon a tortured soul into existence.
It could very well be that statistics and tokens is how our brains work at the computational level too. Just that our algorithms have slightly better heuristics due to all those millennia of A/B testing of our ancestors.
Except we know for a fact that the brain doesn’t work that way. You’re ignoring the entire history of neuroscience.
I think it’s a disingenuous read to assume original commenter means “understanding” in the literal sense. When we talk about LLM “understanding”, we usually mean it from a practical sense. If you give an input to the computer, and it gives you an expected output, then colloquially the computer “understood” your input.
What do you mean by “understand”? Do you mean conscious?
Understand just means “parse language” and is highly subjective. If I talk to someone African in Chinese they do not understand me but they are still conscious.
If I talk to an LLM in Chinese it will understand me but that doesn’t mean it is conscious.
If I talk about physics to a kindergartner they will not understand but that doesn’t mean they don’t understand anything.
Do you see where I am going?
[dead]
GOFAI was also a paradigm shift, regardless of that winter. For example, banks started automating assessments of creditworthiness.
What we didn't get was what had been expected, namely things like expert systems that were actual experts, so called 'general intelligence' and war waged through 'blackboard systems'.
We've had voice controlled electronics for a long time. On the other hand, machine vision applications have improved massively in certain niches, and also allowed for new forms of intense tyranny and surveillance where errors are actually considered a feature rather than a bug since they erode civil liberties and human rights but are still broadly accepted because 'computer says'.
While you could likely argue "leaps and bounds with novel methods utilizing AI for chemistry, computational geometry, biology etc." by downplaying the first part or clarifying that it is mainly an expectation, I think most people are going to, for the foreseeable future, keep seeing "AI" as more or less synonymous with synthetic infantile chatbot personalities that substitute for human contact.