The Easiest Way to Build a Type Checker

jimmyhmiller.com

89 points

surprisetalk

4 days ago


26 comments

thomasikzelf a day ago

I recently implemented a Hindley Milner type checker. The algorithm itself is not necessarily super difficult (once you get your head around it ofcourse) but the way it is explained is. It seems like HM is mostly explained by people with a mathematics background that already know the proper notation. I wonder how much overlap there is between people that know the notation and do not know how HM works, probably not much.

Anyway nice demo. Bi-directional is already quite powerful!

mrkeen a day ago

I grabbed the code from the article and annotated it with the different cases from the famous picture*

  switch (expr.kind) {
    case "number"/"string"/"var":
      ... [Var]
    case "call":
      ... [App]
    case "function":
      throw new Error("...[Abs]")
    case "let":
      ... [Let]
Looks like most of the hard work's done, and probably wouldn't be too tricky to get [Abs] in there too!

* https://wikimedia.org/api/rest_v1/media/math/render/svg/10d6...

nkrisc a day ago

In the small example type checker given, would a function of type A -> B -> C be represented something like this?

    { kind: "function", arg: A, returnType: { kind: "function", arg: B, returnType: C}}
Or is that case simply not covered by this bare-bones example? I can't parse the code well enough just in my mind to tell if that would work, but I think it would work?

EDIT:

I just noticed the working demo at the bottom that includes an example of a multi-argument function so that answers whether it works.

bbminner a day ago

I have not looked into the HM algorithm much, but is there (an educational or performance wise) advantage over implementing a (dumb) SAT solver and expressing a problem as a SAT problem? It always seemed like the "natural representation" for this kind problem to me. Does knowing that these are types _specifically_ help you somehow / give you some unique insights that won't hold in other similar SAT problems?

  • recursivecaveat a day ago

    Keep in mind one of the most important attributes of a good compiler is clearly explaining to the user what caused compilation failure and why. If you try to solve in a very abstract and general space it could be challenging to give an actionable error message.

    • saghm 20 hours ago

      I suspect you've intentionally phrased this to avoid referencing type checking in particular, since this is also the main reason that mainstream programming languages tend to use hand-written parsers rather than generators from what I understand, and I imagine it applies to a lot of other features as well.

    • Quekid5 a day ago

      Yup, that's basically it. "SAT says no" isn't a very useful error message.

  • neel_k 8 hours ago

    If you have a bound on the size of the largest type in your program, then HM type inference is linear in the size of the program text.

    The intuition is that you never need to backtrack, so boolean formulae (ie, SAT) offer no help in expressing the type inference problem. That is, if you think of HM as generating a set of constraints, then what HM type inference is doing is producing a conjunction of equality constraints which you then solve using the unification algorithm.

  • remexre a day ago

    how would you encode a program like

        function f<T>(x: T) { return x; }
        function g(x: number) { return { a: f(x), b: f(x.toString()) }; }
    
    in sat?

    if that's easy, how about length and f in:

        function append<T>(xs: list<T>, ys: list<T>) {
          return match xs {
            Nil() -> ys,
            Cons(hd, tl) -> Cons(hd, append(tl, ys)),
          };
        }
        function flatten<T>(xs: list<list<T>>) {
          return match xs {
            Nil() -> Nil(),
            Cons(hd, tl) -> append(hd, flatten(xs)),
          };
        }
        function map<T, U>(f: (T) => U, xs: list<T>) {
          return match xs {
            Nil() -> Nil(),
            Cons(hd, tl) -> Cons(f(hd), tl),
          };
        }
        function sum(xs: list<number>) {
          return match xs {
            Nil() -> 0,
            Cons(hd, tl) -> hd + length(tl),
          };
        }
        function length<T>(xs: list<T>) { return sum(map((_) -> 1, xs)); }
        function f<T>(xs: list<list<T>>) {
          return length(flatten(xs)) === sum(map(length, xs));
        }
    
    hm-style inference handles polymorphism and type application without a complicated sat encoding
  • octachron a day ago

    For a bounded size of types of sub-expressions, HM inference is quasi-linear in the size of the program, because the constraints appearing in the HM algorithm are only equality between meta-variables.A NP-complete SAT solver is not really a good fit for this kind of simple constraints. Even more so when typechecking often represents a significant part of compilation time.

    (Of course the tricky part of the definition above is that the size of types can theoretically be exponential in the size of a program, but that doesn't happen for programs with human-understandable types)

IshKebab a day ago

Nice! I think it's pretty widely agreed that requiring type annotations at the function level is a good thing anyway. Apparently it's considered good practice in Haskell even though Haskell doesn't require it.

I've also worked with OCaml code that didn't do it and you lose a lot of the advantages of static typing. Definitely worse.

Rust got it right.

  • dragonwriter a day ago

    Type annotations on functions in Haskell (or similar languages) are useful for:

    1. leveraging the type checker to verify (aspects of) the correctness of your function, and

    2. communicating intent to humans

    I've found in my own explorations with Haskell that its useful to write with functions with them, then verify that they work, and then remove them to see what the inferred would be (since it already compiled with the annotation, the inferred type will either be identical to or more general than the previously declared type), and generally (because it is good practice to have a declared type), replace the old declared type with the inferred type (though sometimes at this point also changing the name.)

  • toolslive a day ago

    what if your IDE can show the type of any expression as a tooltip ? Would you still think the same?

    • ufo a day ago

      In Haskell, type error messages are always like "types A and B should be equal, but they are not". The problem is that, without type annotations, the compiler cannot know if it is A or B that is wrong, which can result in confusing error messages.

      For example, suppose that you have a bug in the body of a function, but did not provide a type annotation for it. The function might still compile but not with the type you want. The compiler will only notice something is amiss when you try to call the function and it turns out that the function's inferred type doesn't fit the call site.

      Basically, global type inference in the absence of type annotations means that changes in one part of the file can affect inferred types very far away. In practice it's best to use type annotations to limit inference to small sections, so that type errors are reported close to what caused them.

      • redman25 17 hours ago
        2 more

        Since Haskell is statically compiled, wouldn't it not compile at all?

        • ufo 16 hours ago

          That's all happening at compile time. I only meant to say that the function's inferred type isn't what you'd expect.

    • IshKebab 21 hours ago

      Yes absolutely. OCaml's VSCode extension is very good at that so that's the only way I've experienced it.

      The problems are:

      1. Type errors become much less localised and much harder to understand. Instead of "the function you're editing is supposed to return a string but it doesn't" you get "this other function three stack frames away that indirectly calls the function you're editing can't pass some value into println".

      2. The inferred types are as generic as possible, which also means they are as hard to understand as possible.

    • Quekid5 a day ago

      I can't speak for the parent poster, but for global function declarations, yes, absolutely.

      It's infuriating when a type error can "jump" across global functions just because you weren't clear about what types those functions should have had, even if those types are very abstract. So early adopters learned to sprinkle in type annotations at certain points until they discovered that the top-level was a good place. In OCaml this pain is somewhat lessened when you use module interface files, but without that... it's pain.

  • Quekid5 a day ago

    > I think it's pretty widely agreed that requiring type annotations at the function level is a good thing anyway. Apparently it's considered good practice in Haskell even though Haskell doesn't require it.

    In Haskell-land: At the global scope, yes, that's considered good practice, especially if the function is exported from a module. When you just want a local helper function for some tail-recursive fun it's a bit of extra ceremony for little benefit.

    (... but for Rust specifically local functions are not really a big thing, so... In Scala it can be a bit annoying, but the ol' subtyping inference undecidability thing rears its ugly head there, so there's that...)

    • ufo a day ago

      Languages with local type inference can sometimes omit type annotations from lambdas, if that lambda is being returned or passed as an argument to another function. In those situations we know what the expected type of the argument should be and can omit it.

      • Quekid5 a day ago

        Yeah, that's true and that's a good convenience even if it's not full inference. In the case of Scala, the parameter types may often be required, but at least the return type can be omitted, so there's that.

tayo42 a day ago

I thought the implementation here was how hindley Milner worked? I guess not?

  • tekknolagi a day ago

    No, HM is unification based and requires no annotations at all.

    • skybrian a day ago

      Apparently it only gets away without annotations if the language doesn’t support subtyping? Here’s an explanation about why bidirectional type checking is better for that:

      https://www.haskellforall.com/2022/06/the-appeal-of-bidirect...

      It seems to me that type-checking that relies on global constraint-solving is usually a bad idea. Annotated function types result in less confusion about what a function does.

      • ufo a day ago

        Indeed. Unification-based type inference doesn't work great when the type constraints are inequalities.