This page attempts to given an overview over the performance history of Nuitka. It aims to list benchmark results. It's probably also quite bad at it.
Note
This is not comparing Nuitka to CPython yet. It's going to be added, stay tuned.
Running the "pystone" benchmark for 50000 passes, and measuring the amount of ticks with Valgrind. It doesn't matter what "pystones" are output, instead the whole benchmark is measured.
The idea of using Valgrind is to get reliable results, without any system perturbation. It gives good measurements and makes even tiny effects visible. On purpose, the ticks for initialization are not counted, i.e. the tick counter starts only when the main module is entered, not when the compiled types and constant values are prepared.
Of course, "pystone" code exercises not a whole lot of Python features, esp. no modern ones at all. Still it's a nice indicator. It will benefit most from quick module variable access, quick instance attribute access and function call performance, etc.
| Version | Ticks Python 2.6 based | Ticks Python 2.7 based |
|---|---|---|
| 0.3.12 | 953665985 | 963058164 |
| 0.3.13a | 1005929591 | 1015122356 |
| 0.3.14 | 1004624134 | 1012719220 |
| 0.3.15 | 982225502 | 989869570 |
| 0.3.16 | 988196792 | 995211870 |
| 0.3.17 | 987105591 | 994320977 |
| 0.3.18 | 987156343 | 994371676 |
| 0.3.19 | 985157541 | 991972990 |
| 0.3.20 | 984957577 | 992122891 |
| 0.3.21 | 985556104 | 991516464 |
| 0.3.22 | 961690555 | 969300802 |
| 0.3.23 | 961690773 | 969301059 |
| 0.3.24 | 965938294 | 973149931 |
| 0.3.25 | 945144994 | 951448122 |
| 0.4.0 | 949640875 | 955249521 |
| 0.4.1 | 949340754 | 954698822 |
| develop | 951917064 | 957713787 |
The size of the created binary is also an interesting figure of course. While we are very willing to trade performance for executable size, there should be some gain. More code may also mean worse cache performance.
This is a by-product of the above PyStone valgrind based test. It's a pretty automatic result, and an interesting indicator for generated code complexity.
| Version | Size Python 2.6 based | Size Python 2.7 based |
|---|---|---|
| 0.3.12 | 145385 | 144863 |
| 0.3.13a | 148871 | 148305 |
| 0.3.14 | 150247 | 149825 |
| 0.3.15 | 151658 | 151172 |
| 0.3.16 | 154078 | 153480 |
| 0.3.17 | 154788 | 154310 |
| 0.3.18 | 154484 | 153986 |
| 0.3.19 | 154288 | 153706 |
| 0.3.20 | 155489 | 155047 |
| 0.3.21 | 155141 | 154591 |
| 0.3.22 | 162059 | 161465 |
| 0.3.23 | 162187 | 161593 |
| 0.3.24 | 161558 | 161324 |
| 0.3.25 | 171662 | 171280 |
| 0.4.0 | 172630 | 172080 |
| 0.4.1 | 172438 | 171788 |
| develop | 178165 | 177803 |
The peak malloc memory of the resulting binary. While we are willing to trace memory usage over performance, often higher memory usage leads to lower performance, because e.g. two objects that could be shared are now duplicated. Reducing memory usage means to share more objects and consequently better performance.
This is a by-product of the above PyStone valgrind based test. It's a pretty automatic result, and an interesting indicator for memory leaks or avoided objects.
| Version | Peak memory Python 2.6 based | Peak memory Python 2.7 based |
|---|---|---|
| 0.3.12 | 3240026 | 3753757 |
| 0.3.13a | 3248762 | 3762493 |
| 0.3.14 | 4093552 | 4563145 |
| 0.3.15 | 4093528 | 4563121 |
| 0.3.16 | 4093248 | 4562841 |
| 0.3.17 | 4093288 | 4562881 |
| 0.3.18 | 4093288 | 4562881 |
| 0.3.19 | 4093288 | 4562881 |
| 0.3.20 | 4093288 | 4562881 |
| 0.3.21 | 4092304 | 4561897 |
| 0.3.22 | 4092304 | 4561897 |
| 0.3.23 | 4092304 | 4561897 |
| 0.3.24 | 4093072 | 4562665 |
| 0.3.25 | 4093164 | 4562760 |
| 0.4.0 | 4092748 | 4562344 |
| 0.4.1 | 4092748 | 4562344 |
| develop | 4076398 | 4546061 |