Somewhat related but here's my take on super intelligence or AGI. I have worked with CNN,GNN and other old school AI methods, but don't have the resources to build a real SOT LLM, but I do use and tinker with LLM's occasionally.
If AGI or SI(super intelligence)/is possible, and that is an if...I don't think LLM's are going to be this silver bullet solution Just as we have in the real world of people who are dedicated to a single task in their field like a lawyer or construction workers or doctors and brain surgeons, I see the current best path forward as being a "mixture of experts". We know LLM's are pretty good for what iv seen some refer to as NLP problems, where the model input is the tokenized string input. However I would argue an LLM will never built a trained model like stockfish or deepseek. Certain model types seem to be suited to certain issues/types of problems or inputs. True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem. We still do not know if it is possible to combine the knowledge of different types of neural networks like LLMs, convolutional neural networks, and deep learning...and while its certainly worth exploring, it is foolish to throw all hope on a single solution approach. I think the first step would be to create a new type of model where given a problem of any type. It knows the best method to solve it. And it doesn't rely on itself but rather the mixture of agents or experts. And they don't even have to be LLMs. They could be anything.
Where this really would explode is, if the AI was able to identify a problem that it can't solve and invent or come up with a new approach, multiple approaches, because we don't have to be the ones who develop every expert.
Totally agree. An LLM won't be an AGI.
It could be part of an AGI, specifically the human interface part. That's what an LLM is good at. The rest (knowledge oracle, reasoning etc) are just things that kinda work as a side-effect. Other types of AI models are going to be better at that.
It's just that since the masses found that they can talk to an AI like a human they think that it's got human capabilities too. But it's more like fake it till you make it :) An LLM is a professional bullshitter.
> It's just that since the masses found that they can talk to an AI like a human
In a way it's worse: Even the "talking to" part is an illusion, and unfortunately a lot of technical people have trouble remembering it too.
In truth, the LLM is an idiot-savant which dreams up "fitting" additions to a given document. Some humans have prepared a document which is in the form of a a theater-play or a turn-based chat transcript, with a pre-written character that is often described as a helpful robot. Then the humans launch some code that "acts out" any text that looks like it came from that fictional character, and inserts whatever the real-human-user types as dialogue for the document's human-character.
There's zero reason to believe that the LLM is "recognizing itself" in the story, or that is is choosing to self-insert itself into one of the characters. It's not having a conversation. It's not interacting with the world. It's just coded to Make Document Bigger Somehow.
> they think that it's got human capabilities too
Yeah, we easily confuse the character with the author. If I write an obviously-dumb algorithm which slaps together a story, it's still a dumb algorithm no matter how smart the robot in the story is.
Just wanted to point out that the notion of a "document" is also an illusion to the LLM. It's processing a sequence of low dimensional spaces into another sequence of low dimensional spaces. The input spaces preserve aspects of content similarity based on co-occurrence. The model learns to transform these spaces into higher order spaces based on the outcome of training.
You couldn't say that the model has a singular sense of self, but it certainly has been trained on data that allows it to mimic it in short spurts, and mimicry is what humans do to learn more complex/abstract tasks. The training goal is not to learn how to "be", but rather to learn how to "do" the parts necessary to continue existing.
"Fake it till you make it" is really all that's required to exist in the world.
For humans, the notion of “self” is also an illusion. We’re just atoms dancing to physics. But that’s not the right level of abstraction from which to do psychology.
Is the notion of an LLM operating on a document useful for building an intuition about how the LLM operates? That’s the more interesting question.
I’d say it is, although it’s not really a document but a stream of tokens. The intuition is that the model is able to build abstractions from that stream. Some of the abstractions are related to goal states of the training metric. However, it’s easy to imagine how some of the goal states pertain to the model’s survival in the “wild”, and part of the abstraction that the model captures for this goal is a reflection of itself.
> In truth, the LLM is an idiot-savant which dreams up "fitting" additions to a given document.
Tbh I'm not too sure that my brain works fundamentally differently. I'm an idiot-savant who responds to stimuli.
Do you know the number of atoms in my coffee cup right now?
If you say "No, I don't" then you're already way ahead of any LLM, even the biggest ones. Remember, I asked about how many are in there right now. The only correct answer is "I don't know", because I haven't even told you whether my mug is full or empty.
Here's copilot's answer:
If I wanted an explanation of how to approximate it, that's what I would have asked for. You could very probably figure that much out.how many hydrogen atoms are there in my coffee cup right now Copilot said: That's an interesting question! The number of hydrogen atoms in your coffee cup depends on the volume of coffee and its composition. Let's assume you have a standard coffee cup that holds about 240 milliliters (8 ounces) of coffee. Coffee is mostly water, so we can approximate the number of hydrogen atoms based on the water content. Water (H₂O) has 2 hydrogen atoms per molecule. The molar mass of water is approximately 18 grams per mole, and 1 mole of water contains Avogadro's number of molecules, which is about (6.022 \times 10^{23}) molecules. 1 milliliter of water weighs about 1 gram, so 240 milliliters of water weighs 240 grams. This is approximately ( \frac{240}{18} \approx 13.33 ) moles of water. Therefore, the number of water molecules in 240 milliliters of water is: [ 13.33 \times 6.022 \times 10^{23} \approx 8.02 \times 10^{24} \text{ molecules} ] Since each water molecule has 2 hydrogen atoms, the total number of hydrogen atoms is: [ 2 \times 8.02 \times 10^{24} \approx 1.60 \times 10^{25} \text{ hydrogen atoms} ] So, there are approximately (1.60 \times 10^{25}) hydrogen atoms in a standard coffee cup of 240 milliliters. Keep in mind this is a rough estimate, as the actual number can vary based on the exact composition of your coffee.
No, this is the correct answer. This is the answer I'd be looking for if I was asking that question. That you failed to specify the question is a failure on your part.
Even stupid models understand that if I ask them the population of Denmark they only need to give rough approximation.
I am not sure what you mean by LLM when you say they are professional bullshitter. While it was certainly true for model based on transformers just doing inference, recent models have progressed significantly.
> I am not sure what you mean by LLM when you say they are professional bullshitter.
Not parent-poster, but an LLM is a tool for extending a document by choosing whatever statistically-seems-right based on other documents, and it does so with no consideration of worldly facts and no modeling of logical prepositions or contradictions. (Which also relates to math problems.) If it has been fed on documents with logic puzzles and prior tests, it may give plausible answers, but tweaking the test to avoid the pattern-marching can still reveal that it was a sham.
The word "bullshit" is appropriate because human bullshitter is someone who picks whatever "seems right" with no particular relation to facts or logical consistency. It just doesn't matter to them. Meanwhile, a "liar" can actually have a harder job, since they must track what is/isn't true and craft a story that is as internally-consistent as possible.
Adding more parts around and LLM won't change that: Even if you add some external sensors, a calculator, a SAT solver, etc. to create a document with facts in it, once you ask the LLM to make the document bigger, it's going to be bullshitting the additions.
I think the problem is the way you are phrasing your argument implies the LLM is always wrong. Consider a simple prompt: "Write a hello world in Python."
Every LLM i've tested gets this correct. In my mind, it can't be both bullshit and correct.
I would argue that the amount of real bullshit returned from an LLM is correlated to the amount of bullshit you give it. Garbage in, garbage out.
In the end, its irrelevant if its a statistical engine or whatever semantics we want to use (glorified autocomplete). If it solved my problem in less time than I perceive I would have solved it without it, bullshit isn't the word I would use to describe the outputs.
In all fairness though, I do get some bullshit responses.
It only gives you the statistically more likely way a conversation would evolve after one party says "Write a hello world in Python." It only happens to be the correct one.
If I ask a 5yo "42 * 21 equals...?" and the kid replies with a random number, say, "882", and gets it right, it does not mean that the kid knows what multiplication is or how it works.
ChatGPT can use a stateful python environment to do math. It isn’t confabulating the answers, it’s using a calculator.
I mean that's just confabulating the next token with extra steps... ime it does get those wrong sometimes. I imagine there's an extra internal step to validate the syntax there.
I'm not arguing for or against anything specifically, I just want to note that in practice I assume that to the LLM it's just a bunch of repeating prompts with the entire convo, and after outputting special 'signifier' tokens, the llm just suddenly gets a prompt that has the results of the program that was executed in an environment. for all we know various prompts were involved in setting up that environment too, but I suspect not.
> In my mind, it can't be both bullshit and correct.
It's easy for bullshitters to say some true things, but it doesn't change the nature of the process that got the results. Ex:
________
Person A: "The ghost of my dead gerbil whispers unto me the secrets of the universe, and I am hearing that the local volcano will not erupt today."
Person B: "Bullshit."
[24 hours later]
Person A: "See? I was correct! I demand an apology for your unjustified comment."
> it does so with no consideration of worldly facts
Why don't you consider its training set (usually the entire internet, basically) worldly facts? It's true that the training set can contain contradictory facts, but usually an LLM can recognize these contradictions and provide analysis of the different viewpoints. I don't see how this is much different from what humans can do with documents.
The difference is that humans can do their own experiments and observations in the real world to verify or dismiss things they read. Providing an LLM with tools can, in a limited way, allow an LLM to do the same.
Ultimately its knowledge is limited by its training set and the 'external' observations it can make, but this is true of all agents, no?
LLMs are trained with a data which may contain both truthful and false information.
But at inference time it’s not referring to that data at all. Some of the data is aliased and encoded in the model’s weights, but we’re not sure exactly what’s encoded.
It may very well be that vague concepts (like man, woman, animal, unhealthy) are encoded, but not details themselves.
Further, at inference time, there is no kind of “referencing” step. We’ve just seen that they can sometimes repeat text they were trained on, but sometimes they just don’t.
The LLM based systems you’re probably using do some RAG work to insert relevant information in the LLM’s context. This context still is not being referred to per se. An LLM might have a document that says the sky is red, but still insist that it’s blue (or vice versa)
So while the info an LLM may have available is limited by its training data and the RAG system around it, none of that is guaranteed at inference time.
There’s always a significant chance for the LLM to make up bullshit.
> The word "bullshit" is appropriate because human bullshitter is someone who picks whatever "seems right" with no particular relation to facts or logical consistency.
Not quite true - this is true for your random bullshitter, but professional bullshitters do, in fact, care for the impression of logical consistency and do have a grip on basic facts (if only so they can handwave them more effectively). As such, LLMs are definitely not yet pros at bullshitting :)
Tell me you haven’t used the latest models, without telling me you haven’t used the latest models?
They do hallucinate at times, but you’re missing a lot of real utility by claiming they are basically bullshit engines.
They can now use tools, and maintain internal consistency over long context windows (with both text and video). They can iterate fully autonomously on software development by building, testing, and bug fixing on real world problems producing usable & functioning code.
There’s a reason Microsoft is putting $80 billion dollars on the line to run LLMs. It’s not because they are full of shit!
Meta put $45 Billion into the Metaverse... so how much virtual real estate do you own?
It's true, they're very convincing bullshitters ;)
There's a _lot_ of smoke and mirrors. Paste a sudoku into chatgpt and ask it to solve. Amazing, it does it perfectly! Of course that's because it ran a sudoku-solving program that it pulled off github.
Now ask it to solve step by step by pure reasoning. You'll get a really intelligent sounding response that sounds correct, but on closer inspection makes absolutely no sense, every step has ridiculous errors like "we start with options {1, 7} but eliminate 2, leaving only option 3", and then at the end it just throws all that out and says "and therefore ..." and gives you the original answer.
That tells me there's essentially zero reasoning ability in these things, and anything that looks like reasoning has been largely hand-baked into it. All they do on their own is complete sentences with statistically-likely words. So yeah, as much as people talk about it, I don't see us as being remotely close to AGI at this point. Just don't tell the investors.
On the other side of the coin, I think people also underestimate the amount of human thinking and intelligence is just completing statistically likely words. Most actions and certainly reactions people do everyday involve very little reasoning. Instead just following the most used neuron.
Human vision works this way. To fix the latency problem (actual event hff happening vs signal transmitted to your brain) human vision is constantly predicting what you should see, your brain tells you that is what you saw (the prediction), and then the brain does reconciliation after the fact. Your brain will scramble for coherency when prediction and reality do not match. This trickery is why it seems like you see events in real time, when there is actually a significant delay between event and perception.
Though, there are error correction mechanisms, systems for validation, and a coherent underlying model of the world that is used by tthee brain.
FWIW, it is likely the most used set of neuron connections, sets of millions in play and their interconnections being the important part. That subset being one of billions of others with thousands of connections between each neuron - keep in mind it is not the set of neurons firing that matters, but the set of connections firing. The set of connections is a vastly large number.
Like, if you have three neurons, your brain can encode 10 data points. Let's call these A, B,C. A firing and terminating is one (so three for each), each edge, eg A to B is another three, each set of two edges, eg A to B to C (three more), and all three edges for one more. Then keep in mind you have billions of neurons and they are each interconnected by the thousands.
True, and this even happens in a brain the size of a flea's.
Which makes one wonder, what is it that makes processing and reconciling millions of visual signals per second "easy", but reasoning through a simple sudoku near impossible?
Are you sure about fleas? I thought we know this does not happen with frogs as far as we can tell https://courses.csail.mit.edu/6.803/pdf/lettvin.pdf
Oh. No, not sure.
I do not know how many times that frog type experiment has been repeated on other species or branches of animal life but that one study up ended my preconceived notion of how vision could work with a brain, and most insects have a tiny brain, speculatively may be possible only of certain automaton type tasks, though wasps IIRC have very small brains relative to other insects but exhibit social behavior so who knows.
Citation needed. The word reasoning isn't describing everything that the brain does, and "just following the most used neuron" is not even wrong.
> However I would argue an LLM will never built a trained model like stockfish or deepseek.
It doesn't have to, the LLM just needs access to a computer. Then it can write the code for Stockfish and execute it. Or just download it, the same way you or I would.
> True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem.
Yep, but I don't see how that relates to LLMs not reaching AGI. They can already write basic Python scripts to answer questions, they just need (vastly) more advanced scripting capabilities.
I don't see why a mixture of experts couldn't be distilled into a single model and unified latent space
You could, but in many cases you wouldn't want to. You will get superior results with a fixed compute budget by relying on external tool use (where "tool" is defined liberally, and can include smaller narrow neural nets like GraphCast & AlphaGo) rather that stuffing all tools into a monolithic model.
Isn't that what the original resnet project disproved? Rather than trying to hand-manicure what the NN should look for, just make it deep enough and give it enough training data, and it'll figure things out on its own, even better than if we told it what to look out for.
Of course, cost-wise and training time wise, we're probably a long way off from being able to replicate that in a general purpose NN. But in theory, given enough money and time, presumably it's possible, and conceivably would produce better results.
I'm not proposing hand-engineering anything, though. I'm proposing giving the AI tools, like a calculator API, a code interpreter, search, and perhaps a suite of narrow AIs that are superhuman in niche domains. The AI with tool use should outperform a competitor AI that doesn't have access to these tools, all else equal. The reason should be intuitive: the AI with tool use can dedicate more of its compute to the reasoning that is not addressed by the available tools. I don't think my views here are inconsistent with The Bitter Lesson.
Exactly what DeepSeek3 is doing.
But the G in AGI stands for General. I think the hope is that there is some as-yet-undiscovered algorithm for general intelligence. While I agree that deferring to a subsystem that is an expert in that type of problem is the best way to handle problems, I would hope that it is possible that that central coordinator not just be able to delegate but design new subsystems as needed. Otherwise what happens when you run out of types of expert problem solvers to use (and still haven't solved the problem well)?
One might argue maybe a mixture of experts is just the best that can be done - and that it's unlikely the AGI be able to design new experts itself. However where do the limited existing expert problem solvers come from? Well - we invented them. Human intelligences. So to argue that an AGI could NOT come up with its own novel expert problem solvers implies there is something ineffable about human general intelligence that can't be replicated by machine intelligence (which I don't agree with).
Once I was high and thought of hallucinations as "noise in the output". From that perspective, and the fact that LLMs are probabilistic machines, then halving the noise would probably involve 4x the computation needed. Which seems to track what I observe. Models are getting MUCH larger, but performance is practically at a standstill.
"If AGI ... is possible"
I don't get this line of thinking. AGI already exists - it's in our heads!
So then the question is: is what's in our heads magic, or can we build it? If you think it's magic, fine - no point arguing. But if not, we will build it one day.
The brain is such an intractable web of connections that it has been really difficult to properly make sense of it.
We can't really talk too much about the differences between the intelligence of a dog and the intelligence of a human; in real terms. It seems as though humans might have more connections, different types of cells but then again; there's species out there that also have types of neurons we don't have and more dense regions in areas of the brain than we do.
And on top of that, dive into a single neuron and you will find a world of complexity. The reason why a neuron might fire or not given a stimuli is an extremely complicated and often stochastic process; that's actually one of the reasons why we use non-linearities in the neural networks we create. But how nuance are we really capturing?
The reason we do mathematics the way we do has well studied neurological patterns, we come out of the box with understandings of the world. And many animals do, actually, similar neurological patterns are found in different species.
It's incredible to think of the precision and the complexity of the tasks a fly undertakes during their life, and we actually have mapped the entire brain (if we can call it that, i would) of a fly. Every neuron and every connection the fly has. There's experiments done with neural networks where we've tried to imitate these (the brain of a fly has less parameters [number of nodes and edges] than modern LLMs) with very interesting results. But can we say we understand them? Not really.
And finally, I want to bring up something that's not usually considered when it comes to these things but there's a lot of processes at the molecular level in our cells that actually make use of quantum mechanics, there's a whole field of biology that's dedicated to studying these processes. So yeah, I mean, maybe we can build it but first we need to understand what's going on and why, I believe.
What processes in our cells make use of quantum mechanics? (I mean in some sense everything is quantum mechanics, but cells are quite big in a quantum mechanics sense. I’d imagine they are mostly classical).
https://en.wikipedia.org/wiki/Quantum_biology
> Mitochondria have been demonstrated to utilize quantum tunneling in their function as the powerhouse of eukaryotic cells.
Expert beginner problem. If you can count a grain of sand, and measure the distance of one centimeter, then surely you can measure the exact length of a coastline and count the exact number grains of sand! (The length and number of grains goes to infinity as you get more detailed)
It is less magic, just insanely complicated. We therefore very well might not build it one day. Your claim we would solve it one day is not obvious and needs solid evidence. Some cryptographic problems require millions of years of compute to solve, why cant it be the case that AGI requires petayears of compute? A billion fold increase in compute still won't do it, hence, maybe not ever. 4 billion years and a trillion fold increase in compute might not be enough. (Assuming we have that long. Dawkins was most concerned about humanity surviving the next 500 years.)
GI is in our heads. The A is artificial which means built by humans. They are asking the same question you are.
> GI is in our heads. The A is artificial which means built by humans.
Humans aren’t built by humans? Where do humans come from, then?
They say the kids aren’t having sex anymore, but I didn’t realize it was because they aren’t aware of the function.
Indeed! That's what I have been thinking for a while but I never had the occasion and or breath to write it down, and you explained it concisely. Finally some 'confirmation' 'bias'...
[dead]