One of the perennial questions about proof automation has been the utility of proofs that cannot be understood by humans.
Generally, most computer scientists using proof automation don't care about the proof itself -- they care that one exists. It can contain as many lemmas and steps as needed. They're unlikely to ever read it.
It seems to me that LLMs would be decent at generating proofs this way: so long as they submit their tactics to the proof checker and the proof is found they can generate whatever is needed.
However for mathematicians, of which I am not a member of that distinguished group, seem to appreciate qualities in proofs such as elegance and simplicity. Many mathematicians that I've heard respond to the initial question believe that a proof generated by some future AI system will not be useful to humans if they cannot understand and appreciate it. The existence of a proof is not enough.
Now that we're getting close to having algorithms that can generate proofs it makes the question a bit more urgent, I think. What use is a proof that isn't elegant? Are proofs written for a particular audience or are they written for the result?
Mathematician here (trained as pure, working as applied). Non-elegant proofs are useful, if the result is important. e.g. People would still be excited by an ugly proof of the Riemann hypothesis.^1 It's important too a lot of other theorems if this is true or not. However, if the result is less central you won't get a lot of interest.
Part of it is, I think, that "elegance" is flowery language that hides what mathematicians really want: not so much new proofs as new proof techniques and frameworks. An "elegant" proof can, with some modification, prove a lot more than its literal statement. That way, even if you don't care much about the specific result, you may still be interested because it can be altered to solve a problem you _were_ interested in.
1: It doesn't have to be as big of a deal as this.
Then again, even an 'elegant' proof can be surprisingly inflexible. I've recently been working through Apéry's proof that ζ(3) is irrational. It's so simple that even a clueless dabbler like me can understand all the details. Yet no one has been able to make his construction work directly for anything else (that hasn't already been proven irrational). C'est la vie, I suppose.
There was a post yesterday of a quanta article: https://news.ycombinator.com/item?id=42644896.
The article explains that two mathematicians were able to place Apery's proof that ζ(3) is irrational into a much wider (and hence more powerful) framework. I doubt that framework is as easy to understand as the original proof. But in the end something with wider applicability did come out of the proof.
Yeah, many of the fancy analytic methods are beyond my level of dabbling. I've been trying to learn more about them, so I can solve the myriad exercises left to the reader in all the Diophantine approximation papers.
Still, the newer methods publicized in the Quanta article definitely get more involved, and at least from my perspective they don't establish things as elegantly as Apéry's ζ(2) and ζ(3) arguments do. Hopefully they turn out to be powerful in practice, to make up for it.
> Part of it is, I think, that "elegance" is flowery language that hides what mathematicians really want: not so much new proofs as new proof techniques and frameworks. An "elegant" proof can, with some modification, prove a lot more than its literal statement. That way, even if you don't care much about the specific result, you may still be interested because it can be altered to solve a problem you _were_ interested in.
Do you feel this could be a matter of framing? If you view the "proof" as being the theorem prover itself, plus the proof that it is correct, plus the assumptions, then whatever capability it gains that lets it prove your desired result probably is generalizable to other things you were interested in. It would seem like a loss if they're dismissed simply because their scratch work is inscrutible.
And of course if you come across an inelegant proof you suddenly have the opportunity to think about it and see if you can make it more elegant!
"It doesn't have to be as big of a deal as this."
Agree. The truthfulness of the four-colour theorem is good to know, although there is not yet any human-readable proof.
I feel like the four-color theorem automated proof is much more 'human-readable' than the proofs done with automated theorem provers. Because with the four-color theorem, there is a human readable proof that says "if this finite set of cases are all colorable, then all planar graphs are colorable". And then there is some rather concrete code that generates all the finite cases, and finds a coloring for them. Every step in there makes sense, and is fully understandable. The fact that the exhaustive checking wasn't done by hand doesn't mean its hard to understand how the proof works, or what is 'actually going on'.
For a general theorem prover, reading the code doesn't explain anything insightful about why the theorem is true. For the 4 color theorem, the code that proved it actually gives insight in how the proof works.
One thing that many mathematicians today don’t think about is how deeply intertwined the field has historically been with theology. This goes back to the Pythagoreans at least.
That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness. Which, incidentally, are directly related to logic, aesthetics, and ethics.
The value of truth in a proof is most obvious.
The value of aesthetics is harder to explain, but there's no denying that it is in fact observably valued by mathematicians.
As for ethics, remember that human morality is a proper subset thereof. Ethics concerns itself with what is good. It may feel like a stretch, but it's perfectly reasonable to say that for two equally true proofs of the same thing, the one that is more beautiful is also more good. Also, obviously, given two equally beautiful proofs, if only one is true then it is also more good.
> That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness
As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back. Gödel proved there are an infinite number of true arithmetic statements unprovable within any (consistent, sufficiently powerful) formal system. But our "gold standard" formal system, ZFC, has about as many axioms as we have fingers — why is finding more axioms not the absolute highest priority of the field?
We struggle to prove facts about Turing machines with only six states, and it's not obvious to me that ZFC is even capable of resolving all questions about the behavior of six state Turing machines (well, specifically just ZF, as C has no bearing on these questions).
Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part). If we can't predict the behavior of the majority of tiny, deterministic systems with ZFC, what does that say about our ability to understand and predict real world data, particularly considering that this data likely has an underlying algorithmic structure vastly more complex than that of a six state Turing machine?
More formally, my complaint with the culture of mathematics is:
1) We know that for any string of data, I(data : ZFC) ≤ min(K(data), K(ZFC)) + O(1)
2) K(ZFC) is likely no more than a few bytes. I think the best current upper bound is the description length of a Turing machine with a few hundred states, but I suspect the true value of K(ZFC) is far lower than that
3) Thus K(data) - K(data | ZFC) ≤ "a few bytes"
Consider the massive amounts of data that we collect to train LLMs. The totality of modern mathematics can provide no more than a few bytes of insight into the "essence" of this data (i.e., the maximally compressed version of the data). Which directly translates to limited predictability of the data via Solomonoff induction. And that's in principle — this doesn't even consider the amount of time involved. If we want to do better, we need more axioms, full stop.
One might counter, "well sure, but mathematicians don't necessarily care about real world problems". Ok, just apply the same argument to the set of all arithmetic truths. Or the set of unprovable statements in the language of a formal system (that are true within some model). That's some interesting data. Surely ZFC can discover most "deep" mathematical truths? Not very likely. The deeper truths tend to occur at higher levels of the arithmetic hierarchy. The higher in the hierarchy, the more interesting the statement. And these are tiny statements too: ∀x ∃y ∀z [...]. Well we're already in trouble because ZFC can only decide a small fraction of the Π_2 statements that can fit on a napkin and it drops off very quickly at higher levels than that. Again, we need more axioms.
I can’t tell if this is crazy or brilliant. Math has been working diligently for a long time to reduce the axioms. Most of the obvious Gödel sentences are stupid things like there is a number that is the proof of itself. The whole project is to derive all of the structure of mathematics, with a high information complexity, from basic axioms but also from comp,ex definitions. I think the definitions (natural numbers as sets, integers as equivalence sets of pairs of natural numbers, etc.) pump up the information complexity from the axioms. Like the initial state of Life allowing arbitrary computation from the simple Life rules.
The idea that there might be more axioms that would let one deduce more about computable complexity classes or the like seems pretty unlikely.
The number of provable statements and unprovable statements is countably infinite and we aren’t lacking the ability to prove things due to obviously true missing axioms.
> > That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness
> As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back.
Holding the field back from what? If the goal of the practitioners of the field is to seek mathematically beauty, then well, that is what they will focus on.
Besides that, I don't really follow your argument about Godel & information theory & that adding more axioms is the key to moving math forwards. In the vast majority of cases, the difficulty in finding a proof of a statement is not that the statement isn't provable under a given formal system, it's that we simply can't find it. But maybe I misunderstand you?
> Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part)
The infinite tape part isn't some minor detail, it's the source of all the difficulty. A "finite-tape Turing machine" is just a DFA.
> is just a DFA
Oh is that all? If resource bounded Kolmogorov complexity is that simple, we should have solved P vs NP by now!
I debated adding a bunch of disclaimers to that parenthetical about when the infinite tape starts to matter, but thought, nah, surely that won’t be the contention of the larger discussion point here haha
It’s an LBA, a Linear Bounded Automata.
- [deleted]
I believe knowing a proof exists will bring us closer to elegant human proofs.
I wanted to justify this with the “Roger Bannister Effect”. The thought is that we’re held back psychologically by the idea of the impossible. It takes one person to do it. And now everyone can do it, freed from the mind trap. But further reading shows we were incrementally approaching what Roger Bannister did first: the 4 minute mile. And the pause before that record was likely not psychological but physical with World War Two. [0] And this jives with the TFA when Mr. Wolfram writes about a quarter of a century not yielding a human interpretation of his computer’s output.
All I’m left with is my anecdotes. I had a math professor in college who assigned homework every class. Since it was his first time teaching, he came up with the questions live. I’d come to class red in the face after struggling with questions all night. Then the professor would sheepishly reveal some of his statements were false. That unknown sapped a lot of motivation. Dead ends felt more conclusive. Falsehood was an easy scapegoat.
[0] https://www.scienceofrunning.com/2017/05/the-roger-bannister...
I think there is something to this idea. There have been cases where person A was working on proving a result but struggled, then person B announced a proof of the result, and then person A was inspired to finish their proof. (Sadly, I don't remember the specifics.)
With formal verification, what you want is correct statement of invariants / things to prove. As long as these are understandable, the readibility of proof doesn't matter much - it's the program job to verify that the proof is correct.
An ugly proof is super useful. It turns a statement into a theorem.
There is a famous quote by Riemann: "If only I had the theorems! Then I should find the proofs easily enough. "
Once you have a proof, simplifying it should be much easier, even for computers.
Hypothesis: an LLM capable of generating a correct proof in a formal language, not through random chance but through whatever passes for “reasoning,” should also be capable of describing the proof in a way meaningful to humans. Because LLMs have a limited context window and are trained on human behavior, they will generate solutions similar to what humans would generate.
We have already accepted some proofs we cannot fully understand, such as the proof of the four color theorem that used computational methods to explore a large solution space and demonstrate that no possible special-case combinations violate the theorem. But that was just one part of the proof.
I wonder what we know about proof space generally, and if we had an ASI that reasoned in a substantially different way than humans, what types of proofs it would be likely to generate. Do most proofs contain structural components that humans find pleasing? Do most devolve into convoluted case analyses? Is there a simplest form that a set of correct proofs could be reduced to?
To me this seems obvious. Copilot might generate wrong things, but what I've seen tends to be human-readable. My experience with Lean is that it feels very much like a functional programming language like Scala, so I'd have to assume that a coding assistant that also knows Lean syntax/libraries would work just like any other programming language.
There will perhaps need to be a transition period where we might need to look at basic type theory augmenting or replacing material in introductory proof classes. Instead of truth tables and ZFC, teach math-as-programming. Propositions are types, implications are functions, etc. If you have the right foundation, I think the stuff ends up being quite legible. Mathlib is very abstract which makes it harder to approach as a beginner, but you could imagine a sort of literate programming approach where we walk students through building their own personal Mathlib, refactoring it to use higher abstractions as they build it up, etc. In a sense this is what a math education is today, but with a kind of half-formal, half-natural language and with a bunch of implicit assumptions about what techniques the student really knows (since details are generally omitted) vs. letting them use whatever macros/tactics/theorems they've learned/created in other courses.
This could all be especially powerful if the objects you're working with have good Widgets[0][1] so you could visualize and interact with various intermediate expressions. I see tons of potential here. The Lean games[2] also show the potential for a bright future here that's kind of along the lines of "build your own library around topic X" (the NN game has been posted here a few times, but it's actually a framework with other games too!).
[0] https://lean-lang.org/lean4/doc/examples/widgets.lean.html
[1] https://github.com/leanprover-community/ProofWidgets4/blob/m...
>One of the perennial questions about proof automation has been the utility of proofs that cannot be understood by humans.
I'm skeptical of an in-principle impossibility of understanding complex proofs. I think computers likely will have, or already do have, a capability for explaining proofs piecemeal, with explanations that bridge from the proof itself to familiar intuitions.
Needing to understanding by increasingly sophisticated bridge explanations may be us getting further removed from immediate understanding but I don't think it crosses a magical threshold or anything that fundamentally calls into question how operational/useful the proofs are.
It depends.
A proven conjecture is IMO better than an unproven one.
But usually, a new proof would shed new lights or build bridges between concepts that were before unrelated.
And in that sense, a proof not understandable by humans is disappointing, because it doesn't really fullfil the need to understand the reason behind why it's true.
i would imagine a proof has several "uses": 1) the proof itself is useful for some other result or proof, and 2), the proof is using a novel technique or uses novel maths, or links to previously unlinked fields, and it's not the proof's result itself that is useful, but the technique developed. This technique can then be applied in other areas to produce other kinds of proofs or knowledge.
I suspect it is the latter that will suffer in automated proofs perhaps - without understanding the techniques, or if the technique is not really novel but just tedious.
I think mathematicians want something more basic, though elegance and simplicity are appreciated. They want to know why something is true, in a way that they can understand. People will write new proof of existing results if they think they get at the "why" better, or even collect multiple proofs of the same result if they each get at the "why" in different ways.
In my limited experience it seems like the "why" of a proof requires a theory of mind from the perspective of the author of the proof.
What I mean is that one could choose a particular tactic to proving a lemma or even rely on certain lemma where that choice is intended for the audience the author has in mind in order to help them understand the steps better.
The context an LLM uses is limited (though growing). However its context is lacking the ability to understand the mind of the interlocutor and the proof they would expect or find helpful.
"Why this proof?" also has a more global context as well. It seems there are have been styles and shifts in cultural norms over the years in proofs. Even though the Elements stood up until the 19th century we don't necessarily write proofs in the style of Euclid even though we may refer to those themes and styles, on purpose, in order to communicate a subtle point.
This isn't a new conundrum. This was a very contentious question in the end of the 19th century, where French mathematicians clashed with the German mathematicians. Poincare is known for describing proofs as texts intended to convince other mathematicians that something is the case, whereas Hilbert believed that automation is the way to go (i.e. have a "proof machine", plug in the question, get the answer and be done with it).
Temporarily, Germans won.
Personally, I don't think that proofs that cannot be understood have no value. We rely on such proofs all the time in our day-to-day interpretation of the world around us, our ability to navigate it and anticipate it. I.e. there's some sort of an automated proof tool in our brains that takes the visual input, feeling of muscle tonus, feeling of the force exerted on our body etc. and then gives an answer as to whether we are able to take the next step, pick up a rock and so on.
But, mathematicians also want proofs to be useful to explain the nature of the thing in question. Because another thing we want to do about things like picking up rocks, is we want to make that more efficient, make inanimate systems that can pick up rocks etc.
----
NB. I'm not entirely sure how much LLMs can contribute in this field. The first successes of AI were precisely in the field of automated proofs, and that's where symbolic AI seems to work great. But, I'm not at all an expert on LLMs. Maybe there's some way I cannot think about that they would be better at this task, but on the face of it they just aren't.
From what I have heard when talking to the people behind formal analysis of protocol security, the main problem currently with using LLMs to 'interact with the theorem prover for you' is that there is nowhere near enough proofs out there for the LLMs to learn how to generalize from them.