This is about the new release with focus on performance and cleanups. It indicates significant progress with the milestone this release series really is about as it adds a compiled_method type.
So far functions, generator function, generator expressions were compiled objects, but in the context of classes, functions were wrapped in CPython instancemethod objects. The new compiled_method is specifically designed for wrapping compiled_function and therefore more efficient at it.
Global variable assignments now also use assign0 where no reference exists.
The assignment code for module variables is actually faster if it needs not drop the reference, but clearly the code shouldn't bother to take it on the outside just for that. This variant existed, but wasn't used as much so far.
The instance method objects are now Nuitka's own compiled type too. This should make things slightly faster by itself.
Our new compiled method objects support dedicated method parsing code, where self is passed directly, allowing to make calls taking a fast path in parameter parsing.
This avoids allocating/freeing a tuple object per method call, while reduced 3% ticks in "PyStone" benchmark, so that's significant.
Solved a TODO of BUILTIN_RANGE to change it to pre-allocating the list in the final size as we normally do everywhere else. This was a tick reduction of 0.4% in "PyStone" benchmark, but the measurement method normalizes on loop speed, so it's not visible in the numbers output.
Parameter variables cannot possibly be uninitialized at creation and most often they are never subject to a del statement. Adding dedicated C++ variable classes gave a big speedup, around 3% of "PyStone" benchmark ticks.
Some abstract object operations were re-implemented, which allows to avoid function calls e.g. in the ITERATOR_NEXT case, this gave a few percent on "PyStone" as well.
A new script "benchmark.sh" (now called "run-valgrind.py") script now starts "kcachegrind" to display the valgrind result directly.
One can now use it to execute a test and inspect valgrind information right away, then improve it. Very useful to discover methods for improvements, test them, then refine some more.
The "check-release.sh" script needs to unset NUITKA_EXTRA_OPTIONS or else the reflection test will trip over the changed output paths.
Pystone(1.1) time for 50000 passes = 0.65 This machine benchmarks at 76923.1 pystones/second
Nuitka 0.3.7 (driven by python 2.6):
Pystone(1.1) time for 50000 passes = 0.28 This machine benchmarks at 178571 pystones/second
This is a 132% speed of 0.3.7 compared to CPython, up from 109% compare to the previous release. This is a another small increase, that can be fully attributed to milestone 2 measures, i.e. not analysis, but purely more efficient C++ code generation and the new compiled method type.
One can now safely assume that it is at least twice as fast, but I will try and get the PyPy or Shedskin test suite to run as benchmarks to prove it.
No milestone 3 work in this release. I believe it's best to finish with milestone 2 first, because these are quite universal gains that we should have covered.