For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.
I also wasn't familiar with this terminology:
> You hand it a function; it tries to match it, and you move on.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
The author's previous post explains this all in a bunch more detail: https://blog.chrislewis.au/using-coding-agents-to-decompile-...
Snowboard Kids 2 was a great N64 game. It was one of a number of racing titles inspired by Mario Kart, but the snowboarding added a bit of a different feel. The battle items were clever, and the stages were really well made given the technical limitations they faced. As a kid with two brothers, we played a lot of competitive multiplayer.
I also remember a few things in the singleplayer being very difficult. The number of times I had to fight/race Dameian in his giant robot running down the mountainside... It's carved into my brain like that footrace against Wizpig in DKR or the Donkey Kong arcade game for the Rareware coin in DK64.
The battle items in Snowboard Kids were clever and memorable. The parachute missile that would launch racers up in the air and then deploy the parachute so they slowly float back down was such a frustrating item to be hit with. The pans that would hit all opponents was iconic and it was hilarious that you could somehow doge it with invisibility. Even the basic rock dropped on the course was somehow memorable.
Great game. It's heartwarming to know that others still remember it and care about it.
We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.
The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.
For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.
The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.
I'd like to see this given a bit more structure, honestly. What occurs to me is constraining the grammar for LLM inference to ensure valid C89 (or close-to, as much can be checked without compilation), then perhaps experimentally switching to a permuter once/if a certain threshold is reached for accuracy of the decompiled function.
Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.
I don't expect constraining the grammar to do all that much for modern LLMs - they're pretty good at constraining themselves. Having it absorb the 1% of failures that's caused by grammar issues is not worth the engineering effort.
The modern approach is: feed the errors back to the LLM and have it fix them.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
They had access to the same C compiler used by Nintendo in 1999? And the register allocation on a MIPS CPU is repeatable enough to get an exact match? That's impressive.
Broadly, yes.
The groundwork for this kind of "matching" process is: sourcing odd versions of the obscure tooling that was used to build the target software 20 years ago, and playing with the flag combinations to find out which was used.
It helps that compilers back then were far less complex than those of today, and so was the code itself. But it's still not a perfect process.
There are cases of "flaky" code - for example, code that depends on the code around it. So you change one function, and that causes 5 other functions to no longer match, and 2 functions to go from not matching to matching instead.
Figuring out and resolving those strange dependencies is not at all trivial, so a lot of decompliation efforts end up wrapping it up at some "100% functional, 99%+ matching".
There's a note about that:
> Snowboard Kids 2 was written in C and compiled to MIPS machine code. The compiler was likely GCC 2.7.2 based on the instruction patterns [3]
The footnote is interesting: https://blog.chrislewis.au/using-coding-agents-to-decompile-...
> This is mostly just guesswork and trying different variations of compiler versions and configuration options. But it isn’t as bad as it sounds since the time period limits which compilers were plausibly used. The compiler arguments used in other, similar, games also provide a useful reference.
Why not follow decompilation like ghidra does, rather than guess, compile, compare? It seems more sensible to actually decompile.
Because decompilation does has functions and variables that are nonhuman parsable ... I.e. func_1223337377 with variables a b c d
helpful