Nuitka Release 0.3.7
This is about the new release with focus on performance and cleanups. It
indicates significant progress with the milestone this release series
really is about as it adds a
So far functions, generator function, generator expressions were
compiled objects, but in the context of classes, functions were wrapped
instancemethod objects. The new
specifically designed for wrapping
compiled_function and therefore
more efficient at it.
Nuitka.pyto execute some script, the exit code in case of “file not found” was not the same as CPython. It should be 2, not 1.
The exit code of the created programs (
--deepmode) in case of an uncaught exception was 0, now it an error exit with value 1, like CPython does it.
Exception tracebacks created inside
withstatements could contain duplicate lines, this was corrected.
Global variable assignments now also use
assign0where no reference exists.
The assignment code for module variables is actually faster if it needs not drop the reference, but clearly the code shouldn’t bother to take it on the outside just for that. This variant existed, but wasn’t used as much so far.
The instance method objects are now Nuitka’s own compiled type too. This should make things slightly faster by itself.
Our new compiled method objects support dedicated method parsing code, where
selfis passed directly, allowing to make calls taking a fast path in parameter parsing.
This avoids allocating/freeing a
tupleobject per method call, while reduced 3% ticks in “PyStone” benchmark, so that’s significant.
BUILTIN_RANGEto change it to pre-allocating the list in the final size as we normally do everywhere else. This was a tick reduction of 0.4% in “PyStone” benchmark, but the measurement method normalizes on loop speed, so it’s not visible in the numbers output.
Parameter variables cannot possibly be uninitialized at creation and most often they are never subject to a
delstatement. Adding dedicated C++ variable classes gave a big speedup, around 3% of “PyStone” benchmark ticks.
Some abstract object operations were re-implemented, which allows to avoid function calls e.g. in the
ITERATOR_NEXTcase, this gave a few percent on “PyStone” as well.
nuitka.codegento contain all code generation related stuff, moved
nuitka.codegen.templatesas part of that.
MainControlmodule now longer reaches into
Generatorfor simple things, but goes through
CodeGenerationfor everything now.
Generatormodule uses almost no tree nodes anymore, but instead gets information passed in function calls. This allows for a cleanup of the interface towards
CodeGeneration. Gives a cleaner view on the C++ code generation, and generally furthers the goal of other than C++ language backends.
More “PyLint” work, many of the reported warnings have been addressed, but it’s not yet happy.
Noneand these values are now already added (as constants) during tree building so that no such special cases need to be dealt with in
CodeGenerationand future analysis steps.
Parameter parsing code has been unified even further, now the whole entry point is generated by one of the function in the new
Split variable, exception, built-in helper classes into separate header files.
The exit codes of CPython execution and Nuitka compiled programs are now compared as well.
Errors messages of methods are now covered by the
ParameterErrorstest as well.
A new script “benchmark.sh” (now called “run-valgrind.py”) script now starts “kcachegrind” to display the valgrind result directly.
One can now use it to execute a test and inspect valgrind information right away, then improve it. Very useful to discover methods for improvements, test them, then refine some more.
The “check-release.sh” script needs to unset
NUITKA_EXTRA_OPTIONSor else the reflection test will trip over the changed output paths.
Pystone(1.1) time for 50000 passes = 0.65
This machine benchmarks at 76923.1 pystones/second
Nuitka 0.3.7 (driven by python 2.6):
Pystone(1.1) time for 50000 passes = 0.28
This machine benchmarks at 178571 pystones/second
This is a 132% speed of 0.3.7 compared to CPython, up from 109% compare to the previous release. This is a another small increase, that can be fully attributed to milestone 2 measures, i.e. not analysis, but purely more efficient C++ code generation and the new compiled method type.
One can now safely assume that it is at least twice as fast, but I will try and get the PyPy or Shedskin test suite to run as benchmarks to prove it.
No milestone 3 work in this release. I believe it’s best to finish with milestone 2 first, because these are quite universal gains that we should have covered.