Just a heads up, this fails to track usage of get and setdefault. The ability to iterate over dicts makes the whole question rather murky.
I didn't know about the setdefault method, and wouldn't have guessed it lets you read a value. Interesting, thanks.
Another way to get data out would be to use the new | operator (i.e. x = {} | y essentially copies dictionary x to y) or the update method or ** unpacking operator (e.g. x = {**y}). But maybe those come under the umbrella of iterating as you mentioned.
setdefault was a go to method before defaultdict was added to the collections module in Python 2.5, which replaced the biggest use case.
It's been some time since I last benchmarked defaultdict but last time I did (circa 3.6 and less?), it was considerably slower than judicious use of setdefault.
One time that defaultdict may come out ahead is if the default value is expensive to construct and rarely needed:
defaultdict takes a factory function, so it's only called if the key is not already present:d.setdefault(k, computevalue())
This applies to some extent even if the default value is just an empty dictionary (as it often is in my experience). You can use dict() as the factory function in that case.d = defaultdict(computevalue)
But I have never benchmarked!
> if the default value is expensive to construct and rarely needed:
I'd say "or" rather than "and": defaultdict has higher overhead to initialise the default (especially if you don't need a function call in the setdefault call) but because it uses a fallback of dict lookup it's essentially free if you get a hit. As a result, either a very high redundancy with a cheap default or a low amount of redundancy with a costly default will have the defaultdict edge out.
For the most extreme case of the former,
versusd = {} for i in range(N): d.setdefault(0, [])
has the defaultdict edge out at N=11 on my machine (561ns for setdefault versus 545 for defaultdict). And that's with a literal list being quite a bit cheaper than a list() call.d = defaultdict(list) for i in range(N): d[0]
- [deleted]
Indeed. Inheriting from 'collections.UserDict' instead of 'dict' will make TFA's code work as intended for most of those edge cases.
UserDict will route '.get', '.setdefault', and even iteration via '.items()' through the '__getitem__' method.
edited to remove "(maybe all?) edge cases". As soon as I posted, I thought of several less common/obvious edge cases.
Along with those and iteration, it also would need to handle del/pop/popitem/update/copy/or/ror/... some of which might necessitate a decision on whether comparisons/repr also count as access.