I'll probably get flamed to death for saying this, but I like Jürgen. I mean, I don't know him in person (never met him) but I've seen a lot of his written work and interviews and what-not and he seems like an alright guy to me. Yes, I get it... there's that whole "ooooh, Jürgen is always trying to claim credit for everything" thing and all. But really, to me, it doesn't exactly come off that way. Note that he's often pointing out the lack of credit assigned even to people who lived and died centuries before him.
His "shtick" to me isn't just about him saying "people didn't give me credit" but it seems more "AI people in general haven't credited the history of the field properly." And in many cases he seems to have a point.
I think you sum up my feelings about him as well. He's a bit much sometimes but it's hard to deny that he's made monumental contributions to the field.
It's also funny that we laugh at him when we also have a joke that in AI we just reinvent what people did in the 80's. He's just the person being more specific as to what and who.
Ironically, I think the problem is we care too much about credit. It ends up getting hoarded rather than shared. We then just oversell our contributions because if you make the incremental improvements that literally everyone does, you get your works rejected for being incremental.
I don't know what it is about CS specifically, but we have a culture problem or attribution and hype. From building on open source, it's libraries all the way down, but we act like we did it all alone. To jumping on bandwagons as if there's a right and immutable truth to how to do certain things, until the bubbles pop and we laugh at how stupid anyone was to do such a thing. Yet we don't contribute back to those projects that have US foundation, we laugh at "theory" which we stand on, and we listen to the same hype train people who got it wrong last time instead of turning to those who got it right. Why? It goes directly counter to the ideas of a group who love to claim rationalism, "working from first principles", and "I care what works"
> we laugh at "theory" which we stand on
This aspect of the industry really annoys me to no end. People in this field are so allergic to theory (which is ironic because CS, of all fields, is probably one of the ones in which theoretical investigations are most directly applicable) that they'll smugly proclaim their own intelligence and genius while showing you a pet implementation of ideas that have been around since the 70s or earlier. Sure, most of the time they implement it in a new context, but this leads to a fragmented language in which the same core ideas are implemented N times with everyone particular personal ignorant terminology choices (see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages).
My favorite Knuth quote[0]
But yeah, in general I hate how people treat theory, acting as if it has no economic value. Certainly both matter, no one is denying that. But there's a strong bias against theory and I'm not sure why. Let's ask ourselves, what is the economic impact of Calculus? What about just the work of Leibniz or Newton? I'm pretty confident that that's significantly north of billions of dollars a year. And we what... want to do less of this type of impactful work? It seems a handful of examples far covers any wasted money on research that has failed (or "failed").If you find that you're spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you're spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.
The problem I see with our field, which leads to a lot of hype, is the belief that everything is simple. This just creates "yes men" and people who do not think. Which I think ends up with people hearing "no" when someone is just acting as an engineer. The job of an engineer is to problem solve. That means you have to identify problems! Identifying them and presenting solutions is not "no", it is "yes". But for some reason it is interpreted as "no".
Don't get me started... but if a PL person goes on a rant here, just know, yes, I upvoted you ;)> see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages
[0] You can probably tell I came to CS from "outside". I have a PhD in CS (ML) but undergrad was Physics. I liked experimental physics because I came to the same conclusion as Knuth: Theory and practice drive one another.
I get weird looks sometimes lately when I point out that "agents" are not a new thing, and that they date back at least to the 1980's and - depending on how you interpret certain things[1] - possibly back to the 1970's.
People at work have, I think, gotten tired of my rant about how people who are ignorant of the history of their field have a tendency to either re-invent things that already exist, or to be snowed by other people who are re-inventing things that already exist.
I suppose my own belief in the importance of understanding and acknowledging history is one reason I tend to be somewhat sympathetic to Schmidhuber's stance.
Another interesting thing I see is how people will refuse to learn history thinking it will harm their creativity[0].
The problem with these types of interpretations is that it's fundamentally authoritarian. Where research itself is fundamentally anti-authoritarian. To elaborate: trust but verify. You trust the results of others, but you replicate and verify. You dig deep and get to the depth (progressive knowledge necessitates higher orders of complexity). If you do not challenge or question results then yes, I'd agree, knowledge harms. But if you're willing to say "okay, it worked in that exact setting, but what about this change?" then there is no problem[1]. In that setting, more reading helps.
I just find these mindsets baffling... Aren't we trying to understand things? You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.
[0] https://bsky.app/profile/chrisoffner3d.bsky.social/post/3liy...
[1] Other than Reviewer #2
> Aren't we trying to understand things?
Unfortunately, for most of us, no. We are trying to deliver business units to increase shareholder value
I think you should have continued reading from where you quoted.
I'm arguing that if you want to "deliver business units to increase shareholder value" that this is well aligned with "trying to understand things.">> Aren't we trying to understand things? ***You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.***
Think about it this way:
Which is more efficient? It is hard to argue that search through an unknown solution space is easier than path optimization over a known solution space. Obviously this is the highly idealized case, but this is why I'm arguing that these are aligned. If you're in the latter situation you advantage yourself by trying to get to the former. Otherwise you are just blindly searching. In that case technical debt becomes inevitable and significantly compounds unless you get lucky. It becomes extremely difficult to pivot as the environment naturally changes around you. You are only advantaged by understanding, never harmed. Until we realize this we're going to continue to be extremely wasteful, resulting is significantly lower returns for shareholders or any measure of value.If you understand things: You can directly address shareholder concerns and adapt readily to market demands. You do not have to search, you already understand the solution space. If you do not understand things: You cannot directly address shareholder concerns and must search over the solution space to meet market demands.
I'm in the same boat. At least there's a couple of us that think this way. I'm always amazed when I run into people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s. People seem to tend to implicitly equate the emergence of modern applications of ideas with the emergence of the ideas themselves.
I wonder at times if it stems back to flaws in the CS pedagogy. I studied philosophy and literature in which tracing the history of thought is basically the entire game. I wonder if STEM fields, since they have far greater operational emphasis, lose out on some of this.
> people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s
And to bring this full circle... if you really (really) buy into Schmidhuber's argument, then we should consider the genesis of neural networks to date back to around 1800! I think it's fair to say that that might be a little bit of a stretch, but maybe not that much so.
Tbf, he literally says that in the interview
> Around 1800, Carl Friedrich Gauss and Adrien-Marie Legendre introduced what we now call a linear neural network, though they called it the “least squares method.” They had training data consisting of inputs and desired outputs, and minimized training set errors through adjusting weights, to generalize on unseen test data: linear neural nets!
... except linear neural nets have a very low level of maximum complexity, no matter how big the network is, until you introduce a nonlinearity, which they didn't. They tried, but it destroys the statistical reasoning so they threw it out. Also I don't envy anyone doing that calculation on paper, least squares is already going to suck bad.
Until you do that, this method is version of a Taylor series, and the only real advantage is the statistical connection between the outcome and what you're doing (and if you're evil, you might point out that while that statistical connection gives reassurance that what you're doing is correct, despite being a proof, points you in the wrong direction)
And if you want to go down that path, SVM kernel-based networks do it better than current neural networks. Neural networks throw out the statistical guarantees again.
If you want to really go back far with neural networks, there's backprop! Newton, although I guess Leibniz' chain rule would make him a very good candidate.
There have been non-linear least squares like methods for quite some time.
It's a clash of cultures.
He is an academic that cares for understanding where ideas came from. His detractors need to be the smartest people in the room to get paid millions and raise billions.
It's not very sexy to say 'Oh yes, we are just using an old Soviet learning algorithm on better hardware. Turns out we would have lost the cold war if the USSR had access to a 5090.' , which won't get you the billions you need to build the supercomputers that push the state of the art today.
It seems his "detractors" (or at least his foes) are also academics - i.e. the same culture - they just cite Hinton and LeCun instead of Schmidhuber.
It helps your career to cite the head of AI in Facebook and the former head of Google. Not so much some academician who worked in the 1970s in the Soviet Socialist Republic of Kazakhstan.
I believe the Schmidhuber-ignoring (according to him) began before those two were at Google/Meta. But, I suppose NYU/Bell Labs and U-Toronto will be more likely to be cited than somewhere in Munich or Switzerland...
The location where scientists reside should be irrelevant - at best the venue where it was published should matter (I can understand that some obscure journals in other languages may be skipped in favor of more well-known (ACM/IEEE/ACL/...) English language conferences or journals), and it's not that Schmidhuber was hiding from the mainstream outlets.
In the age of globalization, it's getting harder to justify ignoring anything out there.
Yeah I have the same feeling. I also think it’s weird to say he’s clearly wrong. I mean, it’s a very subtle question exactly where you cross the line that comes with an obligation to credit others. All ideas build on each other and are to some extent mashups of ideas that came before them. It’s not something you can be clearly wrong about in an objective sense, and Jürgen’s view seems consistent.
Looking at how credit and attribution works in science today (google ”citation rings” for example) I can honestly say that I’d much prefer to live in a world where Jürgen did invent most of AI, rather than the one we’re in now.
> it’s a very subtle question exactly where you cross the line that comes with an obligation to credit others
What he means is there are clusters that systematically avoid citing the true origins of important concepts related to computational learning, and prefer to only cite their own "cluster members", a phenomenon that in scientometrics is known as a citation cartel:
I'm Schmidhuber neutral, but the word on the street is that he is a major asshole and sometimes impossible to work with. His research might be more solid than the Turing award winners but his personality truly kept him behind.