Python: Tprof, a Targeting Profiler

adamj.eu

81 points

jonatron

a day ago


8 comments

stabbles a day ago

`sys.monitoring` is nice. I used it to find inefficiently ordered chains of branches in one of my projects. For example a chain of `if isinstance(foo, ...); elif isinstance(foo, ...); elif isinstance(foo, ...);` can be reordered from most to least popular based on a representative run, to avoid evaluating branch conditions more than necessary. You collect BRANCH_LEFT/BRANCH_RIGHT events, divide the code objects into "basic blocks", build a graph from them with edges weighted by frequency, and identify chains using a simple algorithm [1]. Then report chains where the long jump is taken more often than the short jump. It's like semi-automatic PGO for Python.

[1]: https://dl.acm.org/doi/abs/10.1145/93542.93550

benrutter a day ago

This looke incredibly helpful! Not sure if I hust haven't come across the right tools yet, but I always find performance monitoring in python very tricky.

There aren't any tools I know of, that meet the standard of say coverage, where they're easy to use for beginners and meet 90% of use cases.

jesse__ 21 hours ago

I remember looking at python profiling tools a couple years ago and being pretty disappointed. Most added a huge amount of runtime noise, the lowest being something like +50% runtime, which to me is completely unacceptable. The profiler I wrote and use every day for a side project adds well less than 1% runtime overhead.

I wonder what the overhead of sys.monitoring is. It is possible to instrument functions like this without using that API.. you can just walk the AST at startup and do it yourself. Would be interesting to see a very minimal instrumenting profiler that did that and compare the two.

I also love the 'targeted'/instrumenting profiling API. I think sampling profilers are good at scratching the surface, but quickly run out of useful information when getting down to the goods. Happy to see people doing instrumenting profilers.

Thanks for sharing :)

  • cl3misch 18 hours ago

    I'm wondering, is the overhead a problem for you because it skews profiling results, or does it lead to the overall runtime becoming too long?

    So far I thought profiling might add overhead but the results themselves should be unaffected (modulo the usual pitfalls).

    • jesse__ 16 hours ago

      Mostly the latter, but a lot of tools are so slow the former actually becomes a problem, too. Valgrind is a great example. For realtime applications, Valgrind and friends are pretty much a non-starter.

      To your point about profile results, any profiler that adds more than a couple percentage points at runtime basically destroys the profile results (less than one percent is the acceptable margin, for me). Adding 50% is just laughable, and at the time I looked that was the best available option.

      At the time, I was trying to profile ML models and some tooling surrounding them. There are several reasons you want your profiler to be low overhead:

      1. If the profiler does a ton of useless shit (read: is slow), it has many deleterious effects on the program being profiled. It can evict entries from the CPUs ICache, DCache, and TLB entries to name a few, all of which can cause huge stalls and make something (such as a scan over tensor memory, for example) many times slower than it would be otherwise. You become unable to reason about if something is taking a long time because it's doing something stupid, or if the profiler is just screwing up your day. Introducing this kind of noise to your profile makes it nearly impossible to do a good job at analysis, let alone optimizing anything.

      2. Somewhat unrelated to performance, but, you really want to know more than "this function take up a lot of time", which is basically all sampling profilers tell you. If you look at a flame graph and it says "fooFunc()" takes up 80% of the time, you have no idea if that's because one call to "fooFunc()" took 79% and the rest were negligible, or if they're all slow, or just a handful. That is key information and, in my mind, basically makes sampling profilers unsuitable for anything but 'approxamizing'. Which can be useful, and is often good enough, but if you need to optimize something for real, a sampling profiler exhausts it's usefulness pretty quick.

      Anyways .. there are some random thoughts for you :)

  • stabbles 21 hours ago

    Python 3.15 features a very good sampling profiler with excellent reporting: https://docs.python.org/3.15/library/profiling.sampling.html.

    • benrutter 19 hours ago

      This looks great! If I didn't hace dependencies blocking it, I'd genuinely try out the alpha build just for this profiler alone!

    • jesse__ 16 hours ago

      Took a look at it. I generally don't like sampling profilers, but the live mode is a neat idea.