I am somewhat surprised to see the bucket memory layout which is: [k1/v1],[k2,v2],[k3/v3] etc. where k1,k2,k3 are keys and v1,v2,v3 are values. The CPU cache will not contain more than one [k,v] pair - because the CPU cache line is about 64 bytes and the size of [k,v] pair was about 56 bytes.
So iterating through the bucket looking for a key will require each iteration to fetch the next [k,v] pair from RAM.
Compare this with the following layout: k1,k2,k3,… followed by v1,v2,v3. Looking up the first key in the bucket will end up loading at least one more key in the CPU cache-line. And this should make iterations faster.
The downside of this approach is if the lookup almost all the time results in the first key in the bucket. Then [k1,v1],[k2,v2],k3,v3] packing is better-because the value is also in the CPU cache line .
I am wondering if people on this forum knowvmore about this trade-off. Thanks!!
The trade off is discussed here: https://github.com/golang/go/issues/70835
We're not "iterating through the bucket" in the sense you mean. There's a control word which tells us which slots might have our key, and so we never need to look at keys which do not match the byte from our hash used in the control word.
In most cases there are zero or one matches in the control word, so the interleaving could not help us, but it would still hurt us if N=1 and it's a match, which is the common happy path when keys looked up always or almost always exist by design.