> For the longest time (and for good reasons), floating point operations were considered unsafe for deterministic purposes. That is still true to some extent, but the picture is more nuanced than that. I have since learned a lot about floating point determinism, and these days I know it is mostly safe if you know how to navigate around the pitfalls.
If you're only concerned about identical binaries on x86, it's not too bad because AMD and Intel tend to have intentionally identical implementations of most floating point operations, with the exception of a few of the approximate reciprocal SSE instructions (rcpps, rsqrtps, etc). Modern x86 instructions tend to have their exact results strictly defined to avoid this kind of inconsistency: https://software.intel.com/en-us/articles/reference-implemen...
If you want this to work across ARM and x86 (or even multiple ARM vendors), you are screwed, and need to restrict yourself to using only the basic arithmetic operations and reimplement everything else yourself.
At least in the early 2000s, Bloomberg had strict requirements about this. Their financial terminal has a ton of math calculations. The requirement was that they always had live servers running with two different hardware platforms with different operating systems and different CPU architectures and different build chains. The math had to agree to the same bitwise results. They had to turn off almost all compiler optimisations to achieve this, and you had to handle lots of corner cases in code: can't trust NaN or Infinity or underflow to be portable.
They could transparently load balance a user from one different backend platform to the other with zero visible difference to the user.
Ah the old Enterprise Service Bus...
> If you want this to work across ARM and x86 (or even multiple ARM vendors), you are screwed, and need to restrict yourself to using only the basic arithmetic operations and reimplement everything else yourself.
Is this problematic for WASM implementations? The WASM spec requires IEEE 754-2019 compliance with the exception of NaN bits. I guess that could be problematic if you're branching on NaN bits, or serializing, but ideally your code is mostly correct and you don't end up serializing NaN anyway.
I'm sure you know, but for others reading: even on the same architecture, there is more to floating point determinism than just running the same "x = a + b" code on each system. There's also the state of the FPU (eg: rounding modes) that can affect results.
On older versions of DirectX (maybe even in some modern Windows APIs?) there were cases where it would internally change the FPU mode, causing chaos for callers trying to use floats deterministically[1].
[1] https://gafferongames.com/post/floating_point_determinism/ (see the Elijah quote, especially)
We use floating point operations with deterministic lockstep with a server compiled on GCC in Linux a windows client compiled with MSVC in windows, and an iOS client running on ARM which I believe is compiled with clang.
Works fine.
This is a not a small code base, and no particular care has been taken with the floating point operations used.
As far as I know, the ARM (at least aarch64) situation should be about the same as x86-64. Anything specific that's bad about it? (there's aarch32 NEON with no subnormal support or whatever, but you can just not use it if determinism is the goal)
that RECIP14 link is AVX-512, i.e. not available on a bunch of hardware (incl. the newest Intel client CPUs), so you wouldn't ever use it in a deterministic-simulation multiplayer game anyway, even if you restrict yourself to x86-64-only; so you're still stuck to the basic IEEE-754 ops even on x86-64.
x86-64 is worse than aarch64 is a very important aspect - baseline x86-64 doesn't have fused multiply-add, whereas aarch64 does (granted, the x86-64 FMA extension came out around not far from aarch64/armv8, but it's still a concern, such is life). Of course you can choose to not use fma, but that's throwing perf away. (regardless you'll want -ffp-contract=off or equivalent to make sure compiler optimizations don't screw things up, so any such will need to be manual fma calls anyway)
The Steam hardware survey currently has FMA support at 97%, which is the same level as F16C, BMI1/2, and AVX2. Personally, I would consider all of these extensions to be baseline now; the amount of hardware not supporting them is too small to be worth worrying about anymore.
I'm pretty sure he is talking about deterministic output.