The unlikely story of Teardown Multiplayer

blog.voxagon.se

・

206 points

・

lairv

・

4 days ago

58 comments

jmgao ・ 10 hours ago

> For the longest time (and for good reasons), floating point operations were considered unsafe for deterministic purposes. That is still true to some extent, but the picture is more nuanced than that. I have since learned a lot about floating point determinism, and these days I know it is mostly safe if you know how to navigate around the pitfalls.

If you're only concerned about identical binaries on x86, it's not too bad because AMD and Intel tend to have intentionally identical implementations of most floating point operations, with the exception of a few of the approximate reciprocal SSE instructions (rcpps, rsqrtps, etc). Modern x86 instructions tend to have their exact results strictly defined to avoid this kind of inconsistency: https://software.intel.com/en-us/articles/reference-implemen...

If you want this to work across ARM and x86 (or even multiple ARM vendors), you are screwed, and need to restrict yourself to using only the basic arithmetic operations and reimplement everything else yourself.

dzdt ・ 8 hours ago

At least in the early 2000s, Bloomberg had strict requirements about this. Their financial terminal has a ton of math calculations. The requirement was that they always had live servers running with two different hardware platforms with different operating systems and different CPU architectures and different build chains. The math had to agree to the same bitwise results. They had to turn off almost all compiler optimisations to achieve this, and you had to handle lots of corner cases in code: can't trust NaN or Infinity or underflow to be portable.
They could transparently load balance a user from one different backend platform to the other with zero visible difference to the user.
- section_me ・ 4 hours ago
  
  Ah the old Enterprise Service Bus...
ChadNauseam ・ 3 hours ago

> If you want this to work across ARM and x86 (or even multiple ARM vendors), you are screwed, and need to restrict yourself to using only the basic arithmetic operations and reimplement everything else yourself.
Is this problematic for WASM implementations? The WASM spec requires IEEE 754-2019 compliance with the exception of NaN bits. I guess that could be problematic if you're branching on NaN bits, or serializing, but ideally your code is mostly correct and you don't end up serializing NaN anyway.
turtledragonfly ・ 8 hours ago

I'm sure you know, but for others reading: even on the same architecture, there is more to floating point determinism than just running the same "x = a + b" code on each system. There's also the state of the FPU (eg: rounding modes) that can affect results.
On older versions of DirectX (maybe even in some modern Windows APIs?) there were cases where it would internally change the FPU mode, causing chaos for callers trying to use floats deterministically[1].
[1] https://gafferongames.com/post/floating_point_determinism/ (see the Elijah quote, especially)
Negitivefrags ・ 3 hours ago

We use floating point operations with deterministic lockstep with a server compiled on GCC in Linux a windows client compiled with MSVC in windows, and an iOS client running on ARM which I believe is compiled with clang.
Works fine.
This is a not a small code base, and no particular care has been taken with the floating point operations used.
dzaima ・ 6 hours ago

As far as I know, the ARM (at least aarch64) situation should be about the same as x86-64. Anything specific that's bad about it? (there's aarch32 NEON with no subnormal support or whatever, but you can just not use it if determinism is the goal)
that RECIP14 link is AVX-512, i.e. not available on a bunch of hardware (incl. the newest Intel client CPUs), so you wouldn't ever use it in a deterministic-simulation multiplayer game anyway, even if you restrict yourself to x86-64-only; so you're still stuck to the basic IEEE-754 ops even on x86-64.
x86-64 is worse than aarch64 is a very important aspect - baseline x86-64 doesn't have fused multiply-add, whereas aarch64 does (granted, the x86-64 FMA extension came out around not far from aarch64/armv8, but it's still a concern, such is life). Of course you can choose to not use fma, but that's throwing perf away. (regardless you'll want -ffp-contract=off or equivalent to make sure compiler optimizations don't screw things up, so any such will need to be manual fma calls anyway)
- kbolino ・ 5 hours ago
  
  The Steam hardware survey currently has FMA support at 97%, which is the same level as F16C, BMI1/2, and AVX2. Personally, I would consider all of these extensions to be baseline now; the amount of hardware not supporting them is too small to be worth worrying about anymore.
UltraSane ・ 6 hours ago

I'm pretty sure he is talking about deterministic output.

kajkojednojajko ・ 10 hours ago

I enjoyed playing Teardown when it first came out. It was already a technological marvel back then, so it's even more impressive that they managed to make the simulation deterministic to add multiplayer several years after release. Clearly top tier engineers.

Cool to see that the game is owned by Coffee Stain now, too. Satisfactory has been handled well by them, so I'm optimistic about the future of Teardown as well.

VoidWhisperer ・ 9 hours ago

It seems like Coffee Stain generally handles the games they publish pretty well. They also have Valheim and Deep Rock Galactic published under them, both of which have been reasonably successful and long running games
Hikikomori ・ an hour ago

Not about multiplayer but still interesting.
Dennis Gustafsson – Parallelizing the physics solver – BSC 2025
https://youtu.be/Kvsvd67XUKw

bjord ・ 12 hours ago

I highly respect the implicit decision to forgo repeat purchases by merging into the original game, considering how much work was clearly involved. I haven't played it, but I hope for sustainability's sake, there are sufficient (purely cosmetic) microtransactions to cover their development costs.

prox ・ 12 hours ago

He is already talking about the new engine at the end, so maybe that means a new game/version.
Anyway I recently bought it because of multiplayer. Can’t wait to try it out.

HexDecOctBin ・ 9 hours ago

If someone from Teardown dev team is here, did you guys ever tried to do the physics in voxel space? If I understand correctly, Teardown convert each physics chunk into a mesh and feeds all the meshes into a traditional physics engine. But this means that the voxel models remain constrained to the voxel grid only locally but not globally.

I have been trying to figure out a way to do physics completely in voxel space to ensure a global grid. But I have not been able to find any theory of Newtonian Mechanics that would work in discretised space (Movable Cellular Automata was the closest). I wonder if anyone in the Teardown dev team tried to solve this problem?

DecoPerson ・ 9 hours ago

(I’m not a Teardown dev!)
I tried this on a local project. It looks very jank and the math falls apart quickly. Unfortunately, using a fixed axis-aligned grid for rotating reference frames is not practical.
One to thing I wanted to try but didn’t, was to use dynamic axes. So once an entity is created (that is, a group of voxels not attached to the world grid), it gets its own grid that can rotate relative to the world grid. The challenge would be collision detection between two unaligned grids of voxels. Converting the group to a mesh, like Teardown does, would probably be the easiest and most effective way, unless you want to invent some new game-physics math!
SiempreViernes ・ 7 hours ago

This sounds like a fun thing to do simply for the pleasing global consistency, but the price you will pay is that the physics will inevitably look weird since all our intuition is for smooth space. In this sense it's like those games that try to put you into a 4D space, where the weirdness is sort of the point.
Not sure what you mean with the claim that Newtonian Mechanics doesn't work in discretised space? I'm know there are plenty of codes that discretise space and solve fluid mechanical problems, and that's all Newtonian physics.
Of course you need a quite high resolution (compared to the voxel grid in teardown) when you discretise for it to come out like it does in reality, but if you truly want discretised physics on the same coarse scale as the voxels in teardown you can just run these methods and accept it looks weird.
aruametello ・ 5 hours ago

(not a teardown dev)
i had brainstormed a bit a similar problem (non world aligned voxels "dynamic debris" in a destructible environment. One of the ideas that came through was to have a physics solver like the physX Flex sdk.
https://developer.nvidia.com/flex * 12 years old, but still runs in modern gpus and is quite interesting on itself as a demo * If you run it, consider turning on the "debug view", it will show the colision primitives intead of the shapes.
General purpose physics engine solvers arent that much gpu friendly, but if the only physical primitive shape being simulated are spheres (cubes are made of a few small spheres, everything is a bunch of spheres) the efficiency of the simulation improves quite a bit. (no need for conditional treatment of collisions like sphere+cube, cube+cylinder, cylinder+sphere and so on)
wondered if it could be solved by having a single sphere per voxel, considering only the voxels at the surface of the physically simulated object.
Aeolun ・ 8 hours ago

You could project your dynamic objects to world coordinates, but it would look pretty wonky for small objects. A grid is just fundamentally not going to look very physical.
Maybe you could simulate physics but completely constrain any rotation? Then you’d have falling stuff, and it could move linearly (still moving in 3d space but snapping to the world grid for display purposes)?

tietjens ・ 11 hours ago

In my opinion one of the most impressive independent games published on Steam in the last years.

Aeolun ・ 8 hours ago

I really love Teardown, and I’m absolutely baffled at the bizarre performance that game manages to push on even very large worlds.

I’ve tried to load teardown levels in a homegrown engine and I always end up stuttering like hell as soon as GI becomes involved (or even before that).

I’m going to finally manage to replicate it, and then the new engine will be released and raise the bar again xD

vanderZwan ・ 6 hours ago

Did you check out the "about" section and the timeline of his career on the main page of the linked blog?
https://www.voxagon.se/
Because it looks like your opponent is a Swedish former demoscener who started programming at age 12 on the C64 and Amiga computers in 1990, quickly moving on to writing games and demos in assembly, then professionally developing physics engines since 2001, specializing in game performance profiling and squeezing performance out of optimized mobile games.
As far as game dev stereotypes go you basically picked a Final Boss fight. Good luck, you'll need it :p

esperent ・ 12 hours ago

Looks like someone needs a better web host.

> Due to protection of web servers from repeated attacks, we were forced to restrict access to administrative interface of web pages to selected countries. If you are currently in a foreign country, please sign in to WebAdmin, proceed to your domain management and disable this GeoIP filter in OneClick Installer section.

Sweepi ・ 12 hours ago

"1.Serialize the entire scene, compress the data, and pass it to the joining client. We already do full scene serialization for quicksave and quickload, so this is possible, but the files are large: 30-50 MB is common, often more, so transfer would take a while.

[...]

3. Record the deterministic command stream, pass it to the joining client, and have that client apply all changes to the loaded scene before joining the game. The amount of data is much smaller than in option 2 since we’re not sending any voxel data, but applying the changes can take a while since it involves a lot computation.

Once we started investigating option 3 we realized it was actually less data than we anticipated, but we still limit the buffer size and disable join-in-progress when it fills up. This allows late joins up to a certain amount of scene changes, beyond which applying the commands would simply take an unreasonably long time. "

So [1] is not an option for players who want to do it that way?

jerf ・ 6 hours ago

Part of the problem with [1] is that you still end up needing [3] anyhow, because even if you've got a fiber-to-fiber connection, while the transfer was occurring the game world has moved on and you'll need to replay that anyhow.
But if you've got a solution for [3] that works completely correctly anyhow, then writing lots of code for [1] becomes redundant to that anyhow, even with save/load code sitting right there. Might as well start from the beginning and replay it anyhow.
One of the things I will often do that I sometimes have to explain to my fellow engineers is bound the rate at which we'll have to handle certain changes based on what is making them. If you know that you've got a human on the other end of some system, they can only click and type and enter text so quickly. Yes, you still need to account for things like copy & paste if that's an issue for your system, where they may suddenly roll up with a novel's worth of text, but you know they can't be sending you War and Peace sixty times a second. You can make a lot of good scaling decisions around how many resources a user needs when you remember that. The bitrate coming out of a human being is generally fairly small; we do our human magic with the concatenation of lots of bits over lots of time but the bitrate itself is generally small. For all that Teardown is amazingly more technically complicated than Doom, the "list of instructions the humans gave to the game" is not necessarily all that much larger than a Doom demo (which is itself a recording of inputs that gets played back, not a video file), because even Doom was already brushing up on the limits of how many bits-per-second we humans can emit.
- amlib ・ 2 hours ago
  
  There is always the option of force pausing the game to all clients until the joining client is fully in sync. Age of Empires 2 does something like this when a player that was dropped later rejoins the game. You can even have a screen showing how synced each player is and an ETA based on their download speed, with the ability to chat and even kick a player...
  Obviously that won't scale if you intend to have dozens of players constantly joining a server rather than a "friends only" (or whatever more constrained scenario) where players only occasionally join mid game.
florbo ・ 12 hours ago

fme, it's only kind of inconvenient. By the time the scene gets to the point where join-in-progress is disabled it's complete chaos anyway. Might as well restart the scene.
That said I haven't played any of the more intricate mods out there, but I can how it would become more of an issue.

hannasanarion ・ 7 hours ago

I feel like there's detail missing in the blog.

The "Reliable vs Unreliable" section implies that different parts of the scene are sent using a strict-ordering protocol so that the transforms happen in the same order on every client, but other parts happen in a state update stream with per client queueing.

But which is which? Which events are sent TCP and which are UDP (and is that literally what they're doing, or only a metaphor?)

Really the economy of the text in the blog seems backwards, this section has one short paragraph explaining the concept of deterministic event ordering as important for keeping things straight, and then 3 paragraphs about how player position and velocity are synced in the same way as any other game. I want read more about the part that makes teardown unique!

sapphyrus ・ 6 hours ago

They didn't explain those in detail because they didn't implement these concepts themselves. The GameNetworkingSockets library (and most likely also the internal network backend that was mentioned) provides them: > A reliability layer significantly more sophisticated than a basic TCP-style sliding window. It is based on the "ack vector" model from DCCP (RFC 4340, section 11.4) and Google QUIC and discussed in the context of games by Glenn Fiedler. The basic idea is for the receiver to efficiently communicate to the sender the status of every packet number (whether or not a packet was received with that number). By remembering which segments were sent in each packet, the sender can deduce which segments need to be retransmitted. https://github.com/ValveSoftware/GameNetworkingSockets
- dwroberts ・ 6 hours ago
  
  How did you find out their game used this library? Maybe I missed it but didn’t seem to be mentioned anywhere
  
  yakcyll ・ 3 hours ago
  ・ 2 more
  
  They mentioned switching to the Steam networking backend, which for games is essentially GNS.
  
  dwroberts ・ 2 hours ago
  
  Ahh thank you, didn’t make the link between the names
yakcyll ・ 3 hours ago

There really doesn't seem to be anything new or unique to their solution. I'm personally not surprised, because it is what has worked for thirty years.
I presume you know this, but maybe for others: judging by what was written in that paragraph, I'd indeed assume he means the same paradigm that has been driving replicated real-time simulations since at least QuakeWorld - some world-state updates (in this case _"object transforms, velocities, and player positions"_, among others) don't have to be reliable, because in case you miss one and need a retransmit, by the time you receive it, it's already invalid - the next tick of the simulation has already been processed and you will get updated info anyway (IIRC this is essentially John Carmack's insight from his .plan files).
The command messages (player operations and events) _need_ to be reliable, because they essentially serve as the source of truth when it comes to the state of the world. Each game client depends on them to maintain a state that is the same for everyone (assuming determinism the Teardown team has been working on ensuring).
dwroberts ・ 7 hours ago

I would imagine it’s all UDP and the reliable+ordered is just a different mode which does the re-sending etc.
I would be surprised if they actually had TCP at all

tantalor ・ 5 hours ago

For a bit more context, here's the announcement video:

https://www.youtube.com/watch?v=XfcCyMQ13XM

(Edit: fixed link)

daveidol ・ 5 hours ago

Seems like the wrong link to me. Just a dude working on a car.
- undefined ・ 4 hours ago
  
  [deleted]

DimitriBouriez ・ 11 hours ago

Wouldn't it have been simpler (even if technically heavy) to host the game on a single machine and just stream each player's camera? That way all the physics would be computed in real time on one computer, and each player would just receive a different video stream.

xboxnolifes ・ 7 hours ago

This seems to get brought up in every hn post on video game mulriplayer, and it makes me wonder: do you play video games? I dont know of any video games that do multiplayer that way, and i would think that alone suggests its not a good idea.
Who wants to play a game with 50ms+ keypress to screen update delay? Sounds miserable.
- as1mov ・ 6 hours ago
  
  It makes me laugh whenever there's a post about anticheat on the frontpage and without missing a beat there's always a comment there - "why don't they just run the entire game logic server side and stream the updates to the client??? are they stupid?"
- DimitriBouriez ・ 7 hours ago
  
  I don't play it much anymore, but I used to be a heavy player: the latency isn't 50 ms on GeForce Now (with a French connection, which is pretty good).
  
  xboxnolifes ・ 3 hours ago
  
  GeForce Now has the advantage of being able to pick a server closest to specifically you. If you're near a major datacenter, you're probably getting sub 20ms delay. Annoying, but playable for many games. That is not the case as soon as you try multiplayer with people in different metro areas.
  
  sitzkrieg ・ 6 hours ago
  
  yea thats barely playable for single player games
dwattttt ・ 11 hours ago

Video streams are not known for their low bandwidth needs, let alone adding in RTT latency for inputs.
- DimitriBouriez ・ 10 hours ago
  
  That's true, I'm not saying it comes without trade-offs. But in return you get a perfectly consistent and physically accurate simulation. It would mostly be expensive, I think, but it's technically feasible (services like Shadow or GeForce Now already demonstrate that).
  
  filcuk ・ 10 hours ago
  ・ 3 more
  
  Which one of your friends can host an mp physics heavy game with a number of low-latency high-resolution video streams? I would estimate the average answer to be zero.
  
  gcr ・ 8 hours ago
  ・ 2 more
  
  Perhaps the solution could be to have all players stream the game from a centralized instance, rather than all clients streaming from the host’s instance.
  That would have a number of advantages, come to think of it. For starters, install size could be much lower, piracy would be a non-issue, and there would be no need to worry about cross-platform development concerns.
  
  jerf ・ 6 hours ago
  
  We don't have to theorize about this. We've had cloud gaming for years, and the companies have immense motivations to turn us all into renters in the cloud so they've poured a lot of effort into it and we can see half-a-dozen highly-resourced results now. We can just look at it and we can see that it... almost... works. If you don't care much about latency it definitely works.
  However, Teardown is in the set of games where it just barely works and only if all the stars and the moon align. I'd characterize it as something like, cloud gaming spends 100% of the margin, so if anything, anything goes wrong, it doesn't work very well.
  (Plus, as excited as the companies are about locking us into subscriptions rather than purchases that we own, when it comes time to actually pay for the service they are delivering they sure do like to skimp, because it turns out it's kind of expensive to dedicate the equivalent of a high-end gaming console per person. Most stuff that lives in the cloud, a single user averages using a vanishing fraction of a machine over time, not consuming entire servers at a time. Which doesn't pair well with "you spent 100% of the margin just getting cloud gaming to work at all".)
  
  Aeolun ・ 7 hours ago
  
  Running several raytracers on a single videocard isn’t free either. Syncing the world changes as they do is the least intensive for the server, and the last bandwidth. It’s probably optimal in all ways.
  
  undefined ・ 18 minutes ago
  
  [deleted]
  
  kg ・ 9 hours ago
  ・ 2 more
  
  Most consumer GPUs have a limit on the number of video streams their hardware encoder can handle at once, and in some cases the limit is as low as 2.
  
  DimitriBouriez ・ 7 hours ago
  
  Okay, I didn't know that
thunderfork ・ 15 minutes ago

In addition to what others have said: even where remote play works, some games are worse candidates than others, and I expect Teardown would be towards the "worse" side of the set.
Teardown, visually speaking, is a pretty noisy game at times, and doesn't give a great visual clarity when streamed at real-time-encoding-type bitrates during these noisy moments.
FPS mouse+keyboard is also one of the worst-case scenarios for Moonlight/GFNow/etc. remote play, because first person aiming with a mouse relies very heavily on a tight input-vision feedback loop, and first person camera movement is much harder to encode while preserving detail relative to, say, a static-camera overhead view, or even third person games where full-scene panning is slower and more infrequent.
sand500 ・ 2 hours ago

Might have been a key differentiator for Stadia.
Other comments are worrying about the streaming and latency but local split screen could also be another use case here.

xecaz ・ 11 hours ago

Looking forward to play the MP version with my son who also loves the game. I worry that the game will get heavier and maybe no longer work of the steamdeck, will it?

asadm ・ 3 hours ago

do they have split screen multiplayer? I wish they did.

dopesoap ・ 2 hours ago

If only they hadn't stolen the idea at revision and then not paid the person they stole it from.

Douche bags.

stainlu ・ 9 hours ago

[dead]