This reminds me of one of the most interesting bugs I've faced: I was responsible for developing the component that provided away market data to the core trading system of a major US exchange (which allows the trading system to determine whether an order should be matched in-house or routed to another exchange with a better price).
Throughputs were in the multiple tens of thousands of transactions per second and latencies were in single digit milliseconds (in later years these would drop to double digit microseconds, but that's a different story). Components were written in C++, running on Linux. The machine that ran my component and the trading engine were neighbors in a LAN.
We put my component through a full battery of performance tests, and for a while, we seem to be meeting the numbers. Then one day, with absolutely zero code changes from my end or the trading engine's end, the the latency numbers collapsed. We checked the hardware configs and the rate at which the latest test was run. Both identical.
It took, I think, several days to solve the mystery: in the latest test run, we had added one extra away market to a list of 7 or 8 markets for which my component provided market data to the trading system. We had added markets before without an issue. It's a negligible change to the market data message size, because it only adds a few bytes: market ID, best bid price & quantity, best offer price & quantity. In no way should such a small change result in a disproportionate collapse in the latency numbers. It took a while for us to realize that before the addition of these few bytes, our market data message (a binary packed format), neatly fit into a single ethernet frame. Those extra few bytes pushed it over the 1600 (or 1500?) mark and caused all market data message frames (which were the bulk of messages on the system, next to orders), to fragment. The frame fragmentation and reassembly overhead was enough to clog up the pipes at the rates we were pumping data.
In the short run, I think we managed to do some tweaks and get the message back under 1600 bytes (by omitting markets that did not have a current bid/offer, rather than sending NULLs). I can't recall what we did in the long run.
- [deleted]
- [deleted]
Another thing to look into in this kind of situation is enabling jumbo frames.
“You had an MTU problem. You enable jumbo frames. Now you have two MTU problems”
Unless you control the entire set of possible paths (can be many!) and set all the MTUs to match well, this (while maybe on surface helping with the problem, depending on many things) can set one up with a nasty footgun, whereby black hole will show in the most terrible moment of high traffic. See my PMTUD/PLPMTUD rant elsewhere in this thread.
Given this is a trading system where application latencies are measured in microseconds, the default would be to assume that jumbo frames are totally a valid approach.
- [deleted]