Formal verification of software, as the article acknowledges, relies heavily on the type of software and the development process.
To use formal verification you need to have formal requirements of the behavior of your software. Most software projects and design philosophies are simply incompatible with this.
In software development and design can often fall together, but that means that it is uniquely ill suited for formal methods. If you are developing, before you are sure what you even want, then formal methods do not apply.
But I agree that there are certain areas, mostly smaller, safety critical, systems, which rely on upfront specifications, which can heavily benefit from formal verification. E.g. Aerospace software relies heavily on verification.
It hasn’t been my experience that it is as niche as this. I believe the “costs,” people refer to in these discussions have come way down over the last couple of decades. I’ve taught developers how to use tools like TLA+ and Alloy in week.
It’s not a skill that requires a PhD and years of research to acquire these days.
Nor does writing a basic, high level specification.
If anything you will learn something about the system you are modelling by using a model checker. And that can be useful even if it is used for documentation or teaching.
The fundamental power of formal methods is that they force you to think things through.
All too often I find software developers eager to believe that they can implement concurrent algorithms armed with their own wit, a type checker, and a smattering of unit tests. It can be humbling to find errors in your design and assumptions after using a model checker. And perhaps it’s hubris that keeps programmers from using such tools in more “mundane” and “real world” contexts.
There are a lot more “small” distributed systems than you’d expect and state spaces are generally larger than you’d anticipate if you weren’t open to formalizing your work.
Seems like it would really slow you down though if you adopt it too early. Sometimes you don't even know if the thing you want to do is possible.
My development style is:
- prototype ideas in a quick and dirty dynamic typed language just to gain confidence it's going to work (a lot of ideas die here)
- rewrite in a more performant and safe language with a "type checker and a smattering of unit tests" (usually here I'm "done" and have moved onto the next idea/task. If there's an issue I fix it and add another test case)
I'm trying to imagine where formal verification comes in. I'm imagining something like:
- prototype ideas in a quick and dirty dynamic typed language just to gain confidence it's going to work (a lot of ideas die here)
- Formally model the requirements
- rewrite in a language that can be formally verified and which is hopefully performant and lets me do things like simd and/or cuda if needed
- Never have to fix a bug unless there was a bug in the requirements (?)
To me, it just seems like it would take an order of magnitude longer to develop things this way for not much benefit (I've traded development time and potentially runtime performance in exchange for correctness)
https://adsharma.github.io/agentic-transpilers/
Long way to go. But there is a design.
Use z3/SMT instead of TLA+
I would say that the "nicheness" depends on how you treat software. Your development process and software architecture are engineering choices you make and these engineering choices affect how well formal specification applies.
I didn't talk about "costs" or about "how hard" it is, but that common practices in software development make using formal methods infeasible. If you want to use formal verification you likely have to change how you develop software. In an environment where there is general uncertainty about architecture and behavior, as is common in agile environments, formal verification is difficult to implement. By its nature, formal verification discourages fast iteration.
> By its nature, formal verification discourages fast iteration.
Not necessarily. You can develop the specification and implementation in parallel. That way, at every point in time, you could have a much more thorough and deep understanding of what you're currently building, as well as how exactly you decide to change it.
> I’ve taught developers how to use tools like TLA+ and Alloy in week.
TLA+:
- https://en.wikipedia.org/wiki/TLA%2B
- https://lamport.azurewebsites.net/tla/tla.html
Alloy:
- https://en.wikipedia.org/wiki/Alloy_(specification_language)
What's your recommendation on how to rapidly learn TLA+? I spent some time staring at references and the UI a few months ago and came away very defeated. But I'd like to actually level up here.
For distributed systems this makes sense. Most people aren't writing distributed system components though but things where the risk usually isn't technical like business software. I worked on a project where I got into arguments with the PMs because I pushed back on optimizing performance on the main page of our pre-launch product. I argued that we don't have any real feedback that the page is what is needed and the PM thought I was insane for doubting the main page was needed. We completely redesigned that page twice, deleted it entirely and then had to bring it back because the sales team liked it for initial demos.
Every process is a tradeoff and it anyways depends on your specific circumstances which choice is best for your team and project.
It does depend a lot on circumstance and context.
Is it absolutely important that your system is correct? ... begs the question, correct with respect to what? Generally: a specification.
There are lots of situations where you don't know what your system is supposed to do, where testing a few examples is sufficient, or it's not terribly important that you know that it does what you say it ought to. Generating a batch report or storing some customer responses in a database? Trust the framework, write a few tests if you need to, nobody is going to find a formal specification valuable here.
However, if you need to deploy configuration to a cluster and need to ensure there is at least two nodes with the a version of the configuration that matches the database in the load balancer group at all times during the migration? Need to make sure the migration always completes and never leaves the cluster in a bad state?
Even smaller in scale: need to make sure that references in your allocator don't contain addresses outside of your memory pool? Need to make sure that all locks are eventually released?
It's definitely much faster to iterate on a formal specification first. A model checker executes your model against the entire state space. If you're used to test-driven development or working in a statically typed language, this is useful feedback to get early on in the design process.
What the scope is that is appropriate for using tools like this is quite large and not as niche as some folks imply. I don't do aerospace engineering but I've used TLA+ to model deployment scripts and find bugs in OpenStack deployments, as well as to simply learn that the design of certain async libraries are sound.
Update: more examples.
Could you point to some good resources for either, that enables quick application?
Alloy is a brute force model checker (for rather small models).
Is that the state of the art for formal methods? How do you think of formally verifying systems with floating points calculations, with randomness, with databases and external environments?
> Is that the state of the art for formal methods?
I think the state of the art is quite broad as there are many tools.
Model checking is, in my estimation, the most useful for industry programmers today. It doesn't require as much training to use as automated theorem proving, it doesn't provide the guarantees that mathematical induction and proof do; but you get far more for your buck than from testing alone.
However model checkers are very flexible because the language they use, math, is also very flexible. TLA+ has been used to find errors in the Xbox 360 memory controller before it went to launch as well as in Amazon's S3 [0] -- errors so complex that it would have been highly improbable that a human would detect them and the result of finding those errors and solving them saved real businesses a lot of money.
It comes down to your ability to pick good levels of abstraction. You can start with a very high level specification that admits no details about how the file system works or the kernel syscalls involved and still get value out of it. If those details _do_ matter there's a strategy called refinement where you can create another specification that is more specialized and write an expression that implies if the refined specification holds then the original specification does as well.
Tools for randomness and numerical methods exist and depending on your needs you may want to look at other tools than just a model checker. TLA+, for example, isn't good at modelling "amounts" as it creates a rather large state space quickly; so users tend to create abstractions to represent these things with a finite number of states.
[0] https://lamport.azurewebsites.net/tla/formal-methods-amazon....
Do you have any resources you could link to - for those that are curious?
To get an overview of what all are involved in "Formal Methods" see Understanding Formal Methods by Jean-Francois Monin. The book gives an overview of both the various mathematical models and some of the tools implementing them. There is a lot here and it may seem haphazard but that is only because we haven't yet grasped the "full picture". I have been reading this for a while but still have a long way to go.
A four-part TLA+ in Practice and Theory by user "pron" - https://pron.github.io/posts/tlaplus_part1
Not the OP, but Hillel Wayne’s course/tutorial (https://www.learntla.com/) is fantastic. It’s focused on building practical skills, and helped me build enough competence to write a few (simple, but useful!) specs for some of the systems I work on.
It’s not all or nothing!
I work on a very “product-y” back end that isn’t fully specified, but I have formally specified parts of it.
For instance, I property-based-tested a particularly nasty state machine I wrote to ensure that, no matter what kind of crazy input I called an endpoint with, the underlying state machine never made any invalid transitions. None of the code around the state machine has a formal spec, but because the state machine does, I was able to specify it.
In the process, I found some very subtle bugs I’d have never caught via traditional unit testing.
Completely agree, but obviously you relied on the state machine being sufficiently separate and having a formal specification for it.
State machines are quite common, in aerospace software, where the requirements even specify the states and transitions. If you can formally verify them, I believe, you absolutely should, as often there can be a lot of subtlety going on there, especially if you have a distributed system of state machines sending messages to one another.
Property-based testing is also effective in finding bugs even in the absence of any formal model. Basically you just need to identify informal invariants ("properties"), encode them as asserts, and then run tests with enough coverage to likely find any invariant violations.
"Formal" means "written in a language that can be interpreted by a computer" and that is the very thing programmers do. Writing a code is writing a formal specification of the program's behaviour, and by definition, every piece of software must do that.
But you're right that to benefit from any formal methods, you need to compare the program's behaviour to something else that isn't the program itself, and that other thing also needs to be written in a formal language. To be able to write that other thing, you do need to have a precise understanding of what the wanted behaviour should be so that you could express it in a formal language, but it doesn't have to cover the full behaviour of the software.
As an example of what that means, take automated (unit) tests. Automated tests are a formal specification, and running them is a formal verification method. Tests are a relatively weak specification, and executing them is a relatively weak verification, compared to what we normally mean by "formal methods", but there is no clear qualitative difference -- conceptual or even practical -- between tests and more powerful formal specification and verification. You can think of these practices as more powerful tests or as tests that work differently, but tests are formal specifications, and if they're applicable to a piece of software, it's likely that richer formal specification methods are, too (learning the cost/benfit of other methods is similar to learning the cost/benefit of tests -- you don't get it right at first, but you learn along the way).
>Automated tests are a formal specification, and running them is a formal verification method.
This is a great way to describe it, and an important concept to grasp.
Another kind of formal verification that exists between "standard" unit tests and model checking/automated theorem proving is property-based testing a la quickcheck: You have a function you want to test (e.g., a sort), you describe the properties of inputs ("a list of integers") and state the properties you want to hold of the output ("every number should appear the same number of times in the output as in the input, and every number should be >= the number to its left") and ask the system to generate and test lots of random inputs for you. These properties can often be used more or less directly in model checkers, which makes the checking exhaustive (on some small, finite domain).
You are going to have requirements whether you like it or not. It doesn't matter if you discover them during requirements engineering and validate and deconflict them on a simple text document, or if you discover them as you go during coding (possibly after having coded the wrong thing), or if the client discovers them during the "sprint review" of some allegedly agile cult. The only difference is how much money and time are you willing to trade for being called "agile". Ironically the traditional way of doing a requirements stage is the less expensive of all three options. It is also the most aligned with the original agile spirit, since it converges with the client as soon as posible, at the point where changes are the most cheap (changing a line on a text document).
I think it's more formalisability than requiring upfront design. For example you may have an insurance claim automation system which you can't design upfront because most insurance providers have unspecified behaviour. But that doesn't mean you can refine your automation system as you get more information from interactions. You would still get the benefit that you ensure not leaving out any cases or not having any contradiction in your system
But that still relies on having a prior notion of the formal requirements of your system. I know little about insurance systems, but deriving a formal specification seems like a nightmare task. Although, as you mentioned, if you had a partial one you certainly would get some benefits from it.
It's been a while since I worked on those systems, but you usually decide on some rules that you refine over time. So they are purely logical decisions that you can formalize. I don't see why it would be hard to specify. I don't mean to specify all up front but one can specify the exact decision procedure that is implemented right now very easily in my experience. Generally you have a state machine representing the process.
This state machine is usually embedded in the code, but code has a lot of noise around the state machine that makes it harder to see the underlying state machine.
Static type checking is a kind of formal verification of software - there are formal requirements (the program doesn't go "wrong" in a number of rigorously defined ways) that are automatically checked. And you can certainly do "design and development" together in type-safe languages.
You can also use proof assistants as "unit tests" for entire classes of behaviors rather than specific values. This lets you add proofs incrementally even when a formal spec is too difficult.
I worked on a project where a couple people formally proved our design sound while I was still fixing bugs in the spec. We don’t have spherical cows. Theoretical models aren’t reality. The map is not the territory.
People who argue for formal methods are well aware that theorems change over time [1], the point is that with formal methods the theorems become simpler and in turn result in simpler proofs. With ad hocery or communal wisdom, unless you are a genius artist, the theorem only becomes more complex.
[1] You can see Lakatos in Dijstrak's library for example https://dijkstrascry.com/inventory#box4
V&V doesn't really mean formal verification, unless you're doing some extremely cutting edge stuff. It's so much cheaper to do the basic boundary testing, fuzzing, unit & integration tests than to spend the time and find the talent who can not just formally specify but also code proofs of your requirements being met.
Formal verification is just not as common as you might think in these highly regulated industries; the expertise isn't as prevalent as your average HN reader would think.
I'm not sure that "aerospace software relies heavily on verification" is very commonly the case at all, even if one would like it to be...
What makes you say that? I'm pretty sure it's one of the more heavily verified industries. If it wasn't, air travel would be much more dangerous.
Dijkstra for example didn't think so, I think. He had a din view of testing as compared with proofs for ensuring software correctness.
They both have their place. If you get a requirement wrong you can prove an incorrect system. In a complex system I doubt you can understand it from the requirements and so we should assume they are wrong. A test by constrast can (but need not!) be obviously correct but it is correct one for that case and so doesn't say anything about the rest.
Everything you said about formal verification is also true about tests; do you think software is also uniquely ill suited for TDD?
I did not say that software is uniquely ill suited for formal verification. That also would be total nonsense.
I said that certain philosophies of software design are uniquely unsuited for formal verification.
Besides, tests and formal verification are different. A test is essentially a formal specification for a single point or, depending on how you test, multiple, potentially random, points. Writing or changing a test is less time intensive than writing or changing a full formal specification for an entire subsystem, therefore tests are more suited to volatile software development practices.
> therefore tests are more suited to volatile software development practices.
And having no tests is even more suited...
Maybe there are tradeoffs. If the code will never need the same thing again manual tests are good enough. Most code has things that don't change even as you add new features and so tests are valuable.
That’s true. Back when they were inventing security, the requirement for high-security systems (A1 under TCSEC) required formal verification. Those who did it said its cost ranged from not too burdensome to very high. What’s the reason?
While many requirements existed, the design assurance required specifying what problem was being solved, how it’s being solved, and the safety/security properties. Then, proving that the security properties were embodied in the artifacts in all states. That problem is difficult enough that most people who succeeded were also geniuses good at programming and math. Geniuses are in short supply.
The lessons learned papers also pointed out a recurring problem. Each requirement, security policy, algorithm, etc needed a mathematical specification. Then, proof techniques were necessary that could handle each. Instead of developing software features, you are now both developing software and doing R&D on applied mathematics.
Steve Lipner’s The Ethics of Perfection pointed out it took several quarters to make a major change on a product. The markets value development velocity over quality, even demand side. So, that’s already a no go for most businesses. Likewise, the motivations of OSS developers work against high assurance practices as well.
If you want formal verification, then you are forced to make certain decisions. For one, you need to build simple systems that are easy to verify, from design to language. Then, you use the easiest, most-automated tools for the job. Most drop full verification to do automated, partial checking: TLA+, Alloy, SPARK Ada, Meta’s Infer, etc.. If doing full verification, it’s better to make or build on reusable components like we see done with CompCert and seL4. GEMSOS and VAX VMM advocated that back in the day, too.
So, most software development isn’t fit for use of formal verification. I think we can default on a combo of contracts (specs), automated testing, static analysis, and safe languages. If distributed, stronger consistency when possible. Then, use CI to run the verification on each commit. Some tools can generate patches, too.
> To use formal verification you need to have formal requirements of the behavior of your software.
Not true!
Well, it is true of TLA+, but that is why TLA+ is not the future. The future is dependent types.
It is true we'll always be programming and specifying as we go to some extent. That's why formal methods have to be part of the software itself, not some extra burden off to the side.
Dependent types is basically unique in doing that. You can right some code, then reason about that code. Or you can write a program where the propositions/proofs and executable program proper are so intertwined the separation stops making sense in theory too.
This, and this alone, supports the dialectic of "doing" and "planning" that is hardly unique to software development, but rather is emblematic of most sufficiently complex human activities.
----
A really nice example to make this concrete is Lean's standard libraries's `SatisfyingM` and `MonadSatisfying`
(See https://leanprover-community.github.io/mathlib4_docs/Batteri...)
`SatisfiesM` is an attempt to support "external" reasoning: one already wrote a non-trivial monadic action, and now one wants to do Hoare-like reasoning about preconditions, posts conditions, the returned value, etc.
But a different monadic action that uses the first one might need to then internalize this reasoning. For example, it need to turn a pair of (program, proof about program's return value) into a new program that returns the thing and the proof.
This is `MonadSatisfying` is for. It is a constraint on the monad saying we now how to internalize such a proof.
The "external" proofs are the only ones TLA+ and similar tools support, and the honest truth is that you might want to make them, you might be forced for compliance/legal/insurance reasons to make them, but you are never intrinsically forced by the software itself to make them.
The "internal proofs", in contrast, are so required. The represent different abstractions which state their interface such that it is impossible to use them incorrectly --- providing a proof is a non-negotiable obligation on user of the abstraction just as providing a "regular" argument is a non-negotiable obligation to making a regular function call.
In addition to supporting to modular reasoning --- write a very fancy thing and then make sure your coworkers don't use it incorrectly --- this fixes the economic problem --- gotta do the proofs or the code won't compile!
----
If you have no idea how to start programming more formally, here are some ways to start:
external proofs:
you probably have tests. Think, how did you know those test cases are correct? Can you rewrite individual unit tests as property tests, to "formalize" that reasoning (yes, it is still formal, even though property testing is old and simple. "formal" means "explicit" not "fancy"!). If you can write a property test to by checked by probabilistic methods, then you implicitly have a property you could proove instead. Problem solved!
internal proofs:
Does you program have any `panic("this should never happen")` or similar? Well, why shouldn't it? Types, even non-dependent ones, allow for https://en.wikipedia.org/wiki/Principle_of_explosion, which is nothing more than showing some case-alternative has been given an impossible value, and so must be dead code. Every bit of code you think is unreachable, and your intuition is not correct, has a proof saying it is in fact so, and that proof is needed internally do "discharge" the branch.
Every assert? remember "assert c = branch c -> ok; not c -> panic("this should never happen")" so all your assertions are not things you might prove!
Even people that never have written down specs in their life probably have some assertions or panics lying about. So they too have something to formalize.
This sounds fabulous on paper. Are there any examples of moving unit tests to property tests? Are there examples for proving unreachability? Has anybody successfully managed to explain how this is applicable if you don't have a deep background in formal methods?
If we can't provide a pathway from where we are to a more formal world that is understandable to garden variety developers, it will remain an ivory tower occupation.
(And, in fact, things will get worse. Because in a world of LLM churnage, provable correctness becomes even more important)