In a way this is an argument for languages where it's normal to have result-types [0] or exceptions, instead of -1 etc. It just feels wrong to have any potential collision or confusion between normal results and error results.
In other words, this is a case of https://en.wikipedia.org/wiki/In-band_signaling versus out-of-band signaling https://en.wikipedia.org/wiki/Signaling_(telecommunications)... .
Clears throat the precise wiki page you're looking for is the Semipredicate Problem
-1 is in-band signaling; but it steals a value.
1. But, Python has exceptions! The underlying C language doesn't, but Python's run-time has them and can use them in the C code.
2. It may be an argument for ensuring that absolutely everything that is an object can hash: the object hasher must not have error states. Nobody wants the overhead of a simple hash code being wrapped in a result type.
We all know that types are not Pythonic.
(I’m only mostly joking)
Ackchyually, Python has pretty strong typing, as far as dynamic languages go.
If you use the actual type system and something like mypy, Python is a joy work with. (For my definition of joy, which includes static typing).
Actually writing Python is reasonably nice in that case.
But dealing with Python infrastructure is so awful as to make the whole experience just bad.
uv fixes a lot of that, but I think it will be some time before it's used everywhere, and I have zero hope that the Python devs will ever do the right thing and officially replace Pip with uv.
you also have to trawl through the flags to make sure it actually checks types, e.g. check_untyped_defs
Sure. Everything is a PyObject.
Not really, I don’t think. Python code doesn’t ever really use hash() for anything where specific outputs are expected. Even if hash(a)==hash(b), it’s not implied that a==b or a is b.
But -2 seems like just a bad choice here. -1 and -2 are very "common" integers. Seems they could have just picked a quasi-random large integer to reduce the likelihood of collision. I expect hash functions to have collisions, but I expect it to be "unlikely" in a way that scales with the size of the hash.
Are they? At least as keys for most dictionaries. Zero and positive integers sure, but negative ones would be somewhat rare. I could maybe see something like error handling, but then do you need performance there?
I don't think that matters here, though. This specific hash function is literally used just for putting values in dict buckets. The docs say:
> Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup.
They're not to be used as a crypto digest. The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together. Apparently that hasn't been enough of an issue over the years to bother changing it. One advantage is that those values are common enough that it might break code that incorrectly expects hash() to have predictable outputs. If it used 0xffffffff as the value, lots of busted tests may never stumble across it.
> The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together.
Note that they'll only end up bucketed together in the underlying hashmap. The dictionary will still disambiguate the keys.
>>> a = -1 >>> b = -2 >>> {a: a, b: b} {-1: -1, -2: -2}
Yep, absolutely. End programmers will never see that unless they go digging into the CPython code. It's otherwise invisible.
Not only bucketed together, which means they could still be disambiguated in C by the full hash, but they'll actually share the full hash, which means they'd have to jump back into Python code to check `__eq__` to disambiguate IIUC. That seems like a huge perf issue, is it not?
Without looking it up, I think that's right, but... that's just kind of inevitable with hash tables. Unless you have a small number of keys and get very lucky, it's exceedingly likely that you'll have multiple keys in the same bucket, and you'll have to use whatever language's `__eq__` equivalent mechanism.
A mitigating factor is that dict keys have to be hashable, which implies immutability, which generally implies a small number of native, fast types like str, numbers, or simple data structures like tuples. You could have some frozendict monstrosity as keys, but 1) that's not something you see often, and 2) don't do that. It you must, define a fast __eq__. For example, an object mapping to a database row might look like:
But again, that's just not really something that's done.def __eq__(self, other): if self.pk != other.pk: return False # Fail quickly # If that's true, only then do a deep comparison return self.tuple_of_values == other.tuple_of_values
I bet -2 is way less common than -1 in a typical codebase, particularly a C one.
Yes, but having result types would avoid the need for this special casing.