I think the cost of some formal methods have long been over-stated.
Model checking, for example, is not as hard as it sounds and not nearly as expensive or time consuming as building the wrong thing. Most programmers I’ve taught don’t seem to have too much difficulty learning TLA+ or Alloy. The concepts themselves are fundamentally rather simple.
Theorem proving does take some expertise. But it’s also not an insurmountable obstacle.
However I agree that there’s a spectrum of cost and the higher you go down the assurance chain the effort and time it will take. Often we do price in the cost of errors into the holistic process of software (eg: SRE style error budgets, iterative development, etc).
One angle that works for me when deciding when to use a more substantial FM is the cost of getting it wrong (or conversely, how important is it that we get some key thing right)?
If I’m building a financial database can my business afford to lose customers’ money? Do we have that liability built into our insurance?
Sometimes you need to go slow at first in order to go fast later.
And some properties are simply too hard to trust to some boxes and arrows and a prayer.
Specifying a system correctly can be hard with the previous generation of tools. For instance, using LTL to describe system properties is not necessarily easy. I remember there used to be pattern library for model checking or for temporal logic. For something as simple as checking bounded existence, one has to write LTL formula like below. That certainly is out of most people's interest. Fortunately tools have improved a lot, and engineers do not really need to study temporal logic deeply for many cases.
``` []((Q & <>R) -> ((!P & !R) U (R | ((P & !R) U (R | ((!P & !R) U (R | ((P & !R) U (R | (!P U R)))))))))) ```
Dwyer patterns https://matthewbdwyer.github.io/psp/patterns.html
Thanks! I used to use the same pattern library hosted in either CMU or PSU, IIRC. Glad that it has a new home.
I believe you're way understating the difficulty of learning things like Alloy and doing theorem proving.
Let's say I want to teach them non-formal, high assurance. That will involve safe subsets of their language, functional style of programming with inputs to outputs, isolation of side effects, and code reviews. They can do this without any extra or unusual knowledge.
The basics of testing would likewise build on their existing knowledge. If their language supports it, static analyzers might work with the push of a button. Basic contracts might knock out many type or range errors without much thought. Basic tests go from their intuition to specific code implementing it.
OK, now let's build models using relationships expressed in Alloy or TLA+. Now, they have to learn to think in a totally different way. They'll need new notations, new techniques. They'll have to put time into these not knowing the payoff. They'll also have far less or zero resources like StackOverflow or GeeksForGeeks that can solve the whole problem for them.
Moving into theorem proverbs, they now have to know math or logic. They have to think formally. They have to specify the problem that way, keep it in sync with the English, do lots of work that might just be about limitations of the formal tool, and eventually they succeeded (or dont) at the proof. It's also a very, slow process vs informal methods of correctness checking.
So, it's quite difficult to learn these things vs alternative methods. They also might have low or slow payoff while others payoff quickly and often. That's why I push using the alternatives first. Then, learning others overtime to gradually add them to your toolbox for when it makes sense: data structures, interface specification, highly-critical routines, etc.
It’s not a free skill, for sure.
There’s a spectrum of problems one can apply FMs to. Galois is probably selecting for the harder end of projects that do require PhDs with scary sounding degrees.
In industry there are a lot of problems that don’t require that level of skill which can benefit a lot from application of FM’s.
To support your position, I still think the best ROI is interface specs with test generation and code level proofs. Also, detection of common problems like a type changing or no increment in a while. That would've caught many problems I've had in the last, few months.
The biggest problem I see with that is how to specify those specs, especially contracts. I'd like to see a free collection of specs for common properties. Examples include ranges, ordering, not a value, invariants, and confidentiality. If searchable, then that might already be enough for many situations.
From there, we can develop tools to generate them for people. The tools might parse the code, ask some questions, and generate the specs. Alternatively, LLM's trained on such things might generate specs. They might review them and suggest things. Over time, we'll also have more components that are already spec'd which specialists might build on.
We just really need ways for developers to go from their situation to specs with no work or little work. If it's as easy as code and comments, we'll see more adoption.
> I think the cost of some formal methods have long been over-stated.
how are you measuring cost? the cost of anything in this space (or any other rarefied space) is the cost a "principal/staff" (maybe 1-person year) to build it and then a team of very very strong seniors to maintain it. who can afford that?
> Most programmers I’ve taught don’t seem to have too much difficulty learning TLA+ or Alloy. The concepts themselves are fundamentally rather simple.
this is like saying building a prod LLM pipeline is no big deal because learning the basics of pytorch is easy.
more relevantly: what are "most programmers" going to do when they hit an undecidable clause?
here's a razor for highly technical projects (a kind of efficient market hypothesis for software if you will): if it's easy to do then it's worthless (highly-commofidied, marginal value, etc.).
Formal methods can be understood by interns. This is the weird thing about this industry. Make people run gauntlets then label them and tie their hands behind their backs.
Formal methods is just another tool. Do you need a principal engineer with 15yoe for Haskell or Rust code.
In today's AI and low head count world you want someone who is really good at that thing (regardless of experience)
Needing a principal who can go deep on Kafka queues in a design interview isn't what is required.
Most organisations are measuring impact narratives not skill to get up the org chart.
This might be your echo chamber of only working with phds? I have been doing fm for almost 40 years and, while it became easier, I can count people who actually understand what they are doing when programming on one hand, including seniors. Interns definitely don't understand and cannot learn to do formal methods generally. We see many companies inside as we do troubleshooting for large (fortune 500) companies to mid sized ones and these projects are short and urgent: we see a lot of code and, contrary to what you are saying or what seems to be the norm on HN, most people are really really bad at all they do; most we see cannot explain how the code actually works and indeed, often it's just beaten and trialed and errored until it has a facsimile of what it is supposed to do. There are no peer reviews often no version control and these systems run in massive hospitals, manage insurance, taxes or pensions for millions of people and so on. You are grossly overestimating the norm. If people did enjoy a uni degree, they generally look disgusted when you mention something like haskell as 'that useless stuff we had to to'. When I taught prolog in uni in NL, the students asked all the time why and why not c++ or java which would get them a job and make games. No one knows what formal methods even are.
While I agree with the gist of your argument, you should also note that "organizations that hire outside consultancies for urgent issues" is a self-selecting set of orgs with terrible coding practices. There are probably many more such orgs than GP would imagine, but they are not necessarily the norm, even though they are likely to represent the vast majority of what someone working for such a consultancy would see.
I agree with that, however I think it is the majority. Being stuck in the largest companies in the world seeing the worst code makes me quite confident the majority sucks and the idea of that not being the case is just a huge outlier consumed by HN because of FAANG companies.
I'm not sure these people aren't capable of understanding, I think it's more a product of what you said: their projects were also short and urgent, and that sort of environment is not conducive to understanding, particularly given high churn where people are working on inherited code from people who also didn't have time to learn.
I don't know which interns are you talking to. Most interns can hardly understand e.g. say what `useEffect` means, let alone would be able to understand how to model it.
My intern this summer is teaching me about formal methods.
You are hiring the wrong ones.
Ones that studied CS and got a good grade.
Formal systems may be easier than React to understand (assuming decent effort put into both) as there is more of a disciplined teaching culture around it. useEffect is a feature of a framework React made by a tech company, with coders taking the mantle of teaching it to others.
I'm not joking.
> Formal systems may be easier than React to understand
Some specific formal systems, maybe. I feel like a lot of people could get some mileage out of learning Dafny, and it'd definitely be easier than learning React imo.
Go write something simple like a merge-sort with formal guarantees in Dafny, and tell me it's easier than working on a react project afterwards =P
Merge sort in Dafny:
Imo it's a lot easier to explain than React. Preconditions, postconditions, loop invariants, assertions, termination arguments, and... that's about it, right?predicate sorted(a: seq<int>) { forall i, j :: 0 <= i <= j < |a| ==> a[i] <= a[j] } predicate merge_invariants(a: seq<int>, b: seq<int>, c: array<int>, i: nat, j: nat) reads c { && i <= |a| && j <= |b| && i + j <= c.Length && multiset(c[..i+j]) == multiset(a[..i]) + multiset(b[..j]) && (forall k1, k2 :: 0 <= k1 < i && j <= k2 < |b| ==> a[k1] <= b[k2]) && (forall k1, k2 :: 0 <= k1 < j && i <= k2 < |a| ==> b[k1] <= a[k2]) && sorted(c[..i+j]) && (forall k1, k2 :: 0 <= k1 < i + j && j <= k2 < |b| ==> c[k1] <= b[k2]) && (forall k1, k2 :: 0 <= k1 < i + j && i <= k2 < |a| ==> c[k1] <= a[k2]) } method merge(a: seq<int>, b: seq<int>) returns (out: seq<int>) requires sorted(a) requires sorted(b) ensures multiset(a) + multiset(b) == multiset(out) ensures sorted(out) { var i := 0; var j := 0; var c := new int[|a| + |b|]; while i < |a| && j < |b| decreases |a| + |b| - (i + j) invariant merge_invariants(a, b, c, i, j) { assert a[..i+1] == a[..i] + [a[i]]; assert b[..j+1] == b[..j] + [b[j]]; if a[i] <= b[j] { c[i + j] := a[i]; i := i + 1; } else { c[i + j] := b[j]; j := j + 1; } } while i < |a| invariant merge_invariants(a, b, c, i, j) { assert a[..i+1] == a[..i] + [a[i]]; c[i + j] := a[i]; i := i + 1; } while j < |b| invariant merge_invariants(a, b, c, i, j) { assert b[..j+1] == b[..j] + [b[j]]; c[i + j] := b[j]; j := j + 1; } assert a[..i] == a; assert b[..j] == b; assert c[..i+j] == c[..]; return c[..]; } method merge_sort(a: seq<int>) returns (out: seq<int>) decreases |a| ensures multiset(out) == multiset(a) ensures sorted(out) { if |a| <= 1 { assert sorted(a); out := a; } else { var half := |a| / 2; assert a[..half] + a[half..] == a; var l := merge_sort(a[..half]); var r := merge_sort(a[half..]); out := merge(l, r); } }
Hah, nice one! I didn't know about multiset when I went about this (some years ago) and ended up proving all sorts of facts about sorted sets and partitions.
I think you've changed my mind, actually, thank you. I still maintain that this requires more mathematical maturity than diving into react, but I may be saying that out of ignorance of react's internals =P
For someone who has never done React or Formal? Probably about the same learning curve.
I suspect we are falling into the classic human trap of measuring a complex topic using a linear scale. But the sibling comment convinced me formal methods are more approachable than I thought, for sure.
This thread(I believe) is entirely about real world systems. In that you don't want to prove merge sort is correct. You want to prove that the connections created inside `useEffect` are properly closed or something like that.
> Formal methods can be understood by interns
You're just repeating the same claim - yes I agree interns can also understand pytorch. Does that mean they can build prod LLM pipelines?
Ah I think I see what you mean. For AI pipelines you need top guns anyway regardless of whether they use reversable or traditional computing.
> reversable or traditional computing
o.O ...
No man I'm saying a very simple and non-controversial thing: just because someone does a tut, has an elementary understanding, knows the basics, doesn't mean I'm going to let them build/maintain/own mission critical systems.
It's not that deep.
I see. But your point is then orthogonal. So it's $400/h not $100/h per SWE as you need experienced good people anyway.
But the question really is: what solution minimises tbe number of those hours? Formal or informal?
If I can build something in 100 hours paying 100$/hour, or 33 hours paying 400$/hour, I may well choose the 100 hours version. Which is why we hire interns in the first place, rather than only hiring senior architects for our companies.
Where do you work that you're paying interns 100$/hr and seniors 400$/hr, and are you hiring?
Some places do run on 2x overhead, which means if company is paying $100/hour, that intern is taking $68k/year... Pretty normal in a lot of places.
I'm just copying the numbers from the above comment.
Unless they are executable and part of the language tooling like SPARK, Dafny and co, I see little value coding an abstract model, that might be validated in theory, but hardly anyone will be able to cross-check if the manually written language ZYX code actually maps to the validated model.
And then there are the code changes, which most likely devs aren't going back to the formal model to validate them, before applying the new set of changes.
AWS recently mentioned something called PObserve in an ACM article.
They said it checks logs emitted from the running application against the formal model. Sounds really useful for this exact problem; hopefully one day they'll open source it.
I would need to check the article, although it sounds very fragile, expecting some logs are present in a specific format, and they actually tell the truth about what the code is doing.
I agree with you on the first paragraph, but the second one is something you could presumably enforce with tooling and during reviews, no?
In theory, in practice even enforcing unit tests is a challenge, and most code reviews end up being about code style than anything else, if they happen.
As I understood that article, it seemed to be saying that formal methods often don't provide a good enough cost/benefit tradeoff to be worth it in many situations to many organizations. Right at the end, it basically says that your favorite underappreciated formal method isn't used because it costs too much and/or delivers too little. If an outfit that sells formal methods is saying that, then I suspect that (many) formal methods actually do cost too much for what benefit they deliver.
A key point they made is that formal methods, as currently practiced, have a cost-benefit curve that is very hockey-shaped. This doesn't really mean that the cost isn't worth it, but rather that it's hard to demonstrate the cost-effectiveness in a small pilot.
It also means that the pay-off is fragile. Because apparently a lot of things need to come together for the value to show.
Hey, if you have working (shearable) material that you use for teaching interns, I would be interested!