Nuitka Release 0.3.10

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This new release is major milestone 2 work, enhancing practically all areas of Nuitka. The focus was roundup and breaking new grounds with structural optimization enhancements.

Bug fixes

  • Exceptions now correctly stack.

    When you catch an exception, there always was the exception set, but calling a new function, and it catching the exception, the values of sys.exc_info() didn't get reset after the function returned.

    This was a small difference (of which there are nearly none left now) but one that might effect existing code, which affects code that calls functions in exception handling to check something about it.

    So it's good this is resolved now too. Also because it is difficult to understand, and now it's just like CPython behaves, which means that we don't have to document anything at all about it.

  • Using exec in generator functions got fixed up. I realized that this wouldn't work while working on other things. It's obscure yes, but it ought to work.

  • Lambda generator functions can now be nested and in generator functions. There were some problems here with the allocation of closure variables that got resolved.

  • List contractions could not be returned by lambda functions. Also a closure issue.

  • When using a mapping for globals to exec or eval that had a side effect on lookup, it was evident that the lookup was made twice. Correcting this also improves the performance for the normal case.

New Optimization

  • Statically raised as well as predicted exceptions are propagated upwards, leading to code and block removal where possible, while maintaining the side effects.

    This is brand new and doesn't do everything possible yet. Most notable, the matching of raised exception to handlers is not yet performed.

  • Built-in exception name references and creation of instances of them are now optimized as well, which leads to faster exception raising/catching for these cases.

  • More kinds of calls to built-ins are handled, positional parameters are checked and more built-ins are covered.

    Notable is that now checks are performed if you didn't potentially overload e.g. the len with your own version in the module. Locally it was always detected already. So it's now also safe.

  • All operations and comparisons are now simulated if possible and replaced with their result.

  • In the case of predictable true or false conditions, not taken branches are removed.

  • Empty branches are now removed from most constructs, leading to sometimes cleaner code generated.

Cleanups

  • Removed the lambda body node and replaced it with function body. This is a great win for the split into body and builder. Regular functions and lambda functions now only differ in how the created body is used.
  • Large cleanup of the operation/comparison code. There is now only use of a simulator function, which exists for every operator and comparison. This one is then used in a prediction call, shared with the built-in predictions.
  • Added a Tracing module to avoid future imports of print_function, which annoyed me many times by causing syntax failures for when I quickly added a print statement, not noting it must have the braces.
  • PyLint is happier than ever.

New Tests

  • Enhanced OverflowFunctions test to cover even deeper nesting of overflow functions taking closure from each level. While it's not yet working, this makes clearer what will be needed. Even if this code is obscure, I would like to be that correct here.

  • Made Operators test to cover the `` operator as well.

  • Added to ListContractions the case where a contraction is returned by a lambda function, but still needs to leak its loop variable.

  • Enhanced GeneratorExpressions test to cover lambda generators, which is really crazy code:

    def y():
        yield((yield 1),(yield 2))
    
  • Added to ExecEval a case where the exec is inside a generator, to cover that too.

  • Activated the testing of sys.exc_info() in ExceptionRaising test. This was previously commented out, and now I added stuff to illustrate all of the behavior of CPython there.

  • Enhanced ComparisonChains test to demonstrate that the order of evaluations is done right and that side effects are maintained.

  • Added BuiltinOverload test to show that overloaded built-ins are actually called and not the optimized version. So code like this has to print 2 lines:

    from __builtin__ import len as _len
    
    def len(x):
       print x
    
    return _len(x)
    
    print len(range(9))
    

Organizational

  • Changed "README.txt" to no longer say that "Scons" is a requirement. Now that it's included (patched up to work with ctypes on Windows), we don't have to say that anymore.
  • Documented the status of optimization and added some more ideas.
  • There is now an option to dump the node tree after optimization as XML. Not currently use, but is for regression testing, to identify where new optimization and changes have an impact. This make it more feasible to be sure that Nuitka is only becoming better.
  • Executable with Python3 again, although it won't do anything, the necessary code changes were done.

Summary

It's nice to see, that I some long standing issues were resolved, and that structural optimization has become almost a reality.

The difficult parts of exception propagation are all in place, now it's only details. With that we can eliminate and predict even more of the stupid code of "pybench" at compile time, achieving more infinite speedups.

The new cat

This is the latest addition to the family, our beautiful, young and lovely cat:

Image of Muska on her first day with us.

Muska on her first day with us.

Her name is Muska, and she is with us for a week now. This is an image from her first day in our house.

Nuitka Release 0.3.9

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This is about the new release of Nuitka which some bug fixes and offers a good speed improvement.

This new release is major milestone 2 work, enhancing practically all areas of Nuitka. The main focus was on faster function calls, faster class attributes (not instance), faster unpacking, and more built-ins detected and more thoroughly optimizing them.

Bug fixes

  • Exceptions raised inside with statements had references to the exception and traceback leaked.
  • On Windows the binaries sys.executable pointed to the binary itself instead of the Python interpreter. Changed, because some code uses sys.executable to know how to start Python scripts.
  • There is a bug (fixed in their repository) related to C++ raw strings and C++ "trigraphs" that affects Nuitka, added a workaround that makes Nuitka not emit "trigraphs" at all.
  • The check for mutable constants was erroneous for tuples, which could lead to assuming a tuple with only mutable elements to be not mutable, which is of course wrong.

New Optimization

This time there are so many new optimization, it makes sense to group them by the subject.

Exceptions

  • The code to add a traceback is now our own, which made it possible to use frames that do not contain line numbers and a code object capable of lookups.
  • Raising exceptions or adding to tracebacks has been made way faster by reusing a cached frame objects for the task.
  • The class used for saving exceptions temporarily (e.g. used in try/finally code, or with statement) has been improved so it doesn't make a copy of the exception with a C++ new call, but it simply stores the exception properties itself and creates the exception object only on demand, which is more efficient.
  • When catching exceptions, the addition of tracebacks is now done without exporting and re-importing the exception to Python, but directly on the exception objects traceback, this avoids a useless round trip.

Function Calls

  • Uses of PyObject_Call provide NULL as the dictionary, instead of an empty dictionary, which is slightly faster for function calls.
  • There are now dedicated variants for complex function calls with * and ** arguments in all forms. These can take advantage of easier cases. For example, a merge with star arguments is only needed if there actually were any of these.
  • The check for non-string values in the ** arguments can now be completely short-cut for the case of a dictionary that has never had a string added. There is now code that detects this case and skips the check, eliminating it as a performance concern.

Parameter Parsing

  • Reversed the order in which parameters are checked.

    Now the keyword dictionary is iterated first and only then the positional arguments after that is done. This iteration is not only much faster (avoiding repeated lookups for each possible parameter), it also can be more correct, in case the keyword argument is derived from a dictionary and its keys mutate it when being compared.

  • Comparing parameter names is now done with a fast path, in which the pointer values are compare first. This can avoid a call to the comparison at all, which has become very likely due to the interning of parameter name strings, see below.

  • Added a dedicated call to check for parameter equality with rich equality comparison, which doesn't raise an exception.

  • Unpacking of tuples is now using dedicated variants of the normal unpacking code instead of rolling out everything themselves.

Attribute Access

  • The class type (in executables, not yet for extension modules) is changed to a faster variant of our own making that doesn't consider the restricted mode a possibility. This avoids very expensive calls, and makes accessing class attributes in compiled code and in non-compiled code faster.
  • Access to attributes (but not of instances) got in-lined and therefore much faster. Due to other optimization, a specific step to intern the string used for attribute access is not necessary with Nuitka at all anymore. This made access to attributes about 50% faster which is big of course.

Constants

  • The bug for mutable tuples also caused non-mutable tuples to be considered as mutable, which lead to less efficient code.
  • The constant creation with the g++ bug worked around, can now use raw strings to create string constants, without resorting to un-pickling them as a work around. This allows us to use PyString_FromStringAndSize to create strings again, which is obviously faster, and had not been done, because of the confusion caused by the g++ bug.
  • For string constants that are usable as attributes (i.e. match the identifier regular expression), these are now interned, directly after creation. With this, the check for identical value of pointers for parameters has a bigger chance to succeed, and this saves some memory too.
  • For empty containers (set, dict, list, tuple) the constants created are now are not unstreamed, but created with the dedicated API calls, saving a bit of code and being less ugly.
  • For mutable empty constant access (set, dict, list) the values are no longer made by copying the constant, but instead with the API functions to create new ones. This makes code like a = [] a tiny bit faster.
  • For slice indices the code generation now takes advantage of creating a C++ Py_ssize_t from constant value if possible. Before it was converting the integer constant at run time, which was of course wasteful even if not (very) slow.

Iteration

  • The creation of iterators got our own code. This avoids a function call and is otherwise only a small gain for anything but sequence iterators. These may be much faster to create now, as it avoids another call and repeated checks.
  • The next on iterator got our own code too, which has simpler code flow, because it avoids the double check in case of NULL returned.
  • The unpack check got simlar code to the next iterator, it also has simpler code flow now and avoids double checks.

Built-ins

  • Added support for the list, tuple, dict, str, float and bool built-ins along with optimizing their use with constant parameter.

  • Added support for the int and long built-ins, based on a new "call spec" object, that detects parameter errors at compile time and raises appropriate exceptions as required, plus it deals with keyword arguments just as well.

    So, to Nuitka it doesn't matter now it you write int(value) ``or ``int(x = value) anymore. The base parameter of these built-ins is also supported.

    The use of this call spec mechanism will the expanded, currently it is not applied to the built-ins that take only one parameter. This is a work in progress as is the whole built-ins business as not all the built-ins are covered yet.

Cleanups

  • In 0.3.8 per module global classes were introduced, but the IMPORT_MODULE kept using the old universal class, this got resolved and the old class is now fully gone.
  • Using assertObject in more cases, and in more places at all, catches errors earlier on.
  • Moved the addition to tracebacks into the _PythonException class, where it works directly on the contained traceback. This is cleaner as it no longer requires to export exceptions to Python, just to add a traceback entry.
  • Some PyLint cleanups were done, reducing the number of reports a bit, but there is still a lot to do.
  • Added a DefaultValueIdentifier class that encapsulates the access to default values in the parameter parsing more cleanly.
  • The module CodeTemplatesListContractions was renamed to CodeTemplatesContractions to reflect the fact that it deals with all kinds of contractions (also set and dict contractions), not just list contractions.
  • Moved the with related template to its own module CodeTemplatesWith, so its easier to find.
  • The options handling for g++ based compilers was cleaned up, so that g++ 4.6 and MinGW are better supported now.
  • Documented more aspects of the Scons build file.
  • Some more generated code white space fixes.
  • Moved some helpers to dedicated files. There is now calling.hpp for function calls, an importing.cpp for import related stuff.
  • Moved the manifest generation to the scons file, which now produces ready to use executables.

New Tests

  • Added a improved version of "pybench" that can cope with the "0 ms" execution time that Nuitka has for some if its sub-tests.
  • Reference counting test for with statement was added.
  • Micro benchmarks to demonstrate try finally performance when an exception travels through it.
  • Micro benchmark for with statement that eats up exceptions raised inside the block.
  • Micro benchmarks for the read and write access to class attributes.
  • Enhanced Printing test to cover the trigraphs constant bug case. Output is required to make the error detectable.
  • Enhanced Constants test to cover repeated mutation of mutable tuple constants, this covers the bug mentioned.

Organizational

  • Added a credits section to the "README.txt" where I give credit to the people who contributed to Nuitka, and the projects it is using. I will make it a separate posting to cite these.
  • Documented the requirements on the compiler more clearly, document the fact that we require scons and which version of Python (2.6 or 2.7).
  • The is now a codespeed implementation up and running with historical data for up to Nuitka 0.3.8 runs of "PyStone" and with pybench. It will be updated for 0.3.9 once I have the infrastructure in place to do that automatically.
  • The cleanup script now also removes .so files.
  • The handling of options for g++ got improved, so it's the same for g++ and MinGW compilers, plus adequate errors messages are given, if the compiler version is too low.
  • There is now a --unstriped option that just keeps the debug information in the file, but doesn't keep the assertions. This will be helpful when looking at generated assembler code from Nuitka to not have the distortions that --debug causes (reduced optimization level, assertions, etc.) and instead a clear view.

Nuitka on PyBench - Good and Bad

In case you wonder, [what Nuitka is](/pages/overview.html), look here. Over the 0.3.x release cycle, I have mostly looked at its performance with "pystone". I merely wanted to have a target to look at and enjoy the progress we have made there.

In the context of the Windows port then, Khalid Abu Bakr used the pybench on Windows and that got me interested. It's a nice collection of micro benchmarks, which is quite obviously aimed for looking CPython implementations only. In that it's quite good to check where Nuitka is good at, and where it can still take improvements for the milestone 2 stuff.

Enhancements to PyBench

  • The pybench refused to accept that Nuitka could use so little time on some tests, I needed to hack it to allow it.
  • Then it had "ZeroDivisionError" exceptions, because Nuitka can run fully predictable code not at all, thus with a time of 0ms, which gives interesting factors.
  • Also these are many results, we are going to care for regressions only, so there is an option now to output only tests with negative values.

The Interesting Parts

  • Nuitka currently has some fields where optimizations are already so effective as to render the whole benchmark pointless. Longterm, most of PyBench will not be looked at anymore, where the factor becomes "infinity", there is little point in looking at it. We will likely just use it as a test that optimizations didn't suddenly regress. Publishing the numbers will not be as interesting.
  • Then there are slow downs. These I take seriously, because of course I expect that Nuitka shall only be faster than CPython. Sometimes the implementation of Nuitka for some rarely used features is sub par though. I color coded these in red in the table below.
  • ComplexPythonFunctionCalls: These are twice as slow, which is an tribute to the fact, that the code in this domain is only as good as it needs to be. Of course function calls are very important, and this needs to be addressed.
  • TryRaiseExcept: This is much slower because of the cost of the raise statement, which is extremely high currently. For every raise, a frame object with a specific code object is created, so the traceback will point to the correct location. This is very inefficient, and wasteful. We need to be able to create code objects that can be used for all lines needed, and then we can re-use it and only have one frame object per function, which then can be re-used itself. There is already some work for that in [current git](/pages/download.html) (0.3.9 pre 2), but it's not yet complete at all.
  • WithRaiseExcept: Same problem as TryRaiseExcept, the exception raising is too expensive.
  • Note also that -90% is in fact much worse that +90%, the "diff" numbers from pybench make improvements look much better than regressions do. You can also checkout the comparison on the new [benchmark pages](http://speedcenter.nuitka.net) that I am just creating, they are based on codespeed, which I will blog upon separately.

Look at this table of results as produced by pybench:

Benchmark Results

**Test Name** **min CPython** **min Nuitka** **diff**
BuiltinFunctionCalls 76ms 54ms +41.0%
BuiltinMethodLookup 57ms 47ms +22.1%
CompareFloats 79ms 0ms +inf%
CompareFloatsIntegers 75ms 0ms +inf%
CompareIntegers 76ms 0ms +inf%
CompareInternedStrings 68ms 32ms +113.0%
CompareLongs 60ms 0ms +inf%
CompareStrings 86ms 62ms +38.2%
CompareUnicode 61ms 50ms +21.9%
ComplexPythonFunctionCalls 86ms 179ms -52.3%
ConcatStrings 98ms 99ms -0.6%
ConcatUnicode 127ms 124ms +2.3%
CreateInstances 76ms 52ms +46.8%
CreateNewInstances 58ms 47ms +22.1%
CreateStringsWithConcat 85ms 90ms -6.5%
CreateUnicodeWithConcat 74ms 68ms +9.5%
DictCreation 58ms 36ms +60.9%
DictWithFloatKeys 67ms 44ms +51.7%
DictWithIntegerKeys 64ms 30ms +113.8%
DictWithStringKeys 60ms 26ms +130.6%
ForLoops 47ms 15ms +216.2%
IfThenElse 67ms 16ms +322.5%
ListSlicing 69ms 70ms -0.9%
NestedForLoops 72ms 25ms +187.4%
NestedListComprehensions 87ms 42ms +105.9%
NormalClassAttribute 62ms 77ms -18.9%
NormalInstanceAttribute 56ms 24ms +129.7%
PythonFunctionCalls 72ms 34ms +116.1%
PythonMethodCalls 84ms 38ms +120.0%
Recursion 97ms 56ms +73.1%
SecondImport 61ms 47ms +31.6%
SecondPackageImport 66ms 29ms +125.4%
SecondSubmoduleImport 86ms 32ms +172.0%
SimpleComplexArithmetic 74ms 62ms +18.3%
SimpleDictManipulation 65ms 35ms +89.7%
SimpleFloatArithmetic 77ms 56ms +39.3%
SimpleIntFloatArithmetic 58ms 39ms +48.3%
SimpleIntegerArithmetic 59ms 37ms +57.7%
SimpleListComprehensions 75ms 33ms +128.7%
SimpleListManipulation 57ms 27ms +109.4%
SimpleLongArithmetic 68ms 57ms +19.9%
SmallLists 69ms 41ms +66.6%
SmallTuples 66ms 98ms -32.2%
SpecialClassAttribute 63ms 49ms +29.1%
SpecialInstanceAttribute 130ms 24ms +434.5%
StringMappings 67ms 62ms +8.5%
StringPredicates 69ms 59ms +16.6%
StringSlicing 73ms 47ms +54.8%
TryExcept 57ms 0ms +3821207.1%
TryFinally 65ms 26ms +153.4%
TryRaiseExcept 64ms 610ms -89.5%
TupleSlicing 76ms 67ms +12.7%
UnicodeMappings 88ms 91ms -2.9%
UnicodePredicates 64ms 59ms +8.8%
UnicodeProperties 69ms 63ms +8.8%
UnicodeSlicing 80ms 68ms +17.6%
WithFinally 84ms 26ms +221.2%
WithRaiseExcept 67ms 1178ms -94.3%

She's a doctor now

My wife has passed the final exams here in Germany finally. I am very proud of her for managing that. It's with 2 kids born, a new house built, and lots of difficulties, like not living in a city with a university that offers medicin as a study course.

To celebrate, here is a picture of her from happy days (no photoshop unlike the last time:

Image of my wife

Anna in front of a bush in Dithmarsia (my home state).

Nuitka Release 0.3.8 - Windows Support

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This is to inform you about the new release of Nuitka with some real news and a slight performance increase. The significant news is added "Windows Support". You can now hope to run Nuitka on Windows too and have it produce working executables against either the standard Python distribution or a MinGW compiled Python.

There are still some small things to iron out, and clearly documentation needs to be created, and esp. the DLL hell problem of msvcr90.dll vs. msvcrt.dll, is not yet fully resolved, but appears to be not as harmful, at least not on native Windows.

I am thanking Khalid Abu Bakr for making this possible. I was surprised to see this happen. I clearly didn't make it easy. He found a good way around ucontext, identifier clashes, and a very tricky symbol problems where the CPython library under Windows exports less than under Linux. Thanks a whole lot.

Currently the Windows support is considered experimental and works with MinGW 4.5 or higher only.

Otherwise there have been the usual round of performance improvements and more cleanups. This release is otherwise milestone 2 work only, which will have to continue for some time more.

Bug fixes

  • Lambda generators were not fully compatible, their simple form could yield an extra value. The behavior for Python 2.6 and 2.7 is also different and Nuitka now mimics both correctly, depending on the used Python version
  • The given parameter count cited in the error message in case of too many parameters, didn't include the given keyword parameters in the error message.
  • There was an assert False right after warning about not found modules in the --deep mode, which was of course unnecessary.

New Optimization

  • When unpacking variables in assignments, the temporary variables are now held in a new temporary class that is designed for the task specifically.

    This avoids the taking of a reference just because the PyObjectTemporary destructor insisted on releasing one. The new class PyObjectTempHolder hands the existing reference over and releases only in case of exceptions.

  • When unpacking variable in for loops, the value from the iterator may be directly assigned, if it's to a variable.

    In general this would be possible for every assignment target that cannot raise, but the infrastructure cannot tell yet, which these would be. This will improve with more milestone 3 work.

  • Branches with only pass inside are removed, pass statements are removed before the code generation stage. This makes it easier to achieve and decide empty branches.

  • There is now a global variable class per module. It appears that it is indeed faster to roll out a class per module accessing the module * rather than having one class and use a module **, which is quite disappointing from the C++ compiler.

  • Also MAKE_LIST and MAKE_TUPLE have gained special cases for the 0 arguments case. Even when the size of the variadic template parameters should be known to the compiler, it seems, it wasn't eliminating the branch, so this was a speedup measured with valgrind.

  • Empty tried branches are now replaced when possible with try/except statements, try/finally is simplified in this case. This gives a cleaner tree structure and less verbose C++ code which the compiler threw away, but was strange to have in the first place.

  • In conditions the or and and were evaluated with Python objects instead of with C++ bool, which was unnecessary overhead.

  • List contractions got more clever in how they assign from the iterator value.

    It now uses a PyObjectTemporary if it's assigned to multiple values, a PyObjectTempHolder if it's only assigned once, to something that could raise, or a PyObject * if an exception cannot be raised. This avoids temporary references completely for the common case.

Cleanups

  • The if, for, and while statements had always empty else nodes which were then also in the generated C++ code as empty branches. No harm to performance, but this got cleaned up.
  • Some more generated code white space fixes.

New Tests

  • The CPython 2.7 test suite now also has the doctests extracted to static tests, which improves test coverage for Nuitka again.

    This was previously only done for CPython 2.6 test suite, but the test suites are different enough to make this useful, e.g. to discover newly changed behavior like with the lambda generators.

  • Added Shed Skin 0.7.1 examples as benchmarks, so we can start to compare Nuitka performance in these tests. These will be the focus of numbers for the 0.4.x release series.

  • Added a micro benchmark to check unpacking behavior. Some of these are needed to prove that a change is an actual improvement, when its effect can go under in noise of in-line vs. no in-line behavior of the C++ compiler.

  • Added "pybench" benchmark which reveals that Nuitka is for some things much faster, but there are still fields to work on. This version needed changes to stand the speed of Nuitka. These will be subject of a later posting.

Organizational

  • There is now a "tests/benchmarks/micro" directory to contain tiny benchmarks that just look at a single aspect, but have no other meaning, e.g. the "PyStone" extracts fall into this category.
  • There is now a --windows-target option that attempts a cross-platform build on Linux to Windows executable. This is using "MingGW-cross-env" cross compilation tool chain. It's not yet working fully correctly due to the DLL hell problem with the C runtime. I hope to get this right in subsequent releases.
  • The --execute option uses wine to execute the binary if it's a cross-compile for windows.
  • Native windows build is recognized and handled with MinGW 4.5, the VC++ is not supported yet due to missing C++0x support.
  • The basic test suite ran with Windows so far only and some adaptations were necessary. Windows new lines are now ignored in difference check, and addresses under Windows are upper case, small things.

Numbers

python 2.6:

Pystone(1.1) time for 50000 passes = 0.65
This machine benchmarks at 76923.1 pystones/second

Nuitka 0.3.8 (driven by python 2.6):

Pystone(1.1) time for 50000 passes = 0.27
This machine benchmarks at 185185 pystones/second

This is a 140% speed increase of 0.3.8 compared to CPython, up from 132% compared to the previous release.

Nuitka Release 0.3.7

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This is about the new release with focus on performance and cleanups. It indicates significant progress with the milestone this release series really is about as it adds a compiled_method type.

So far functions, generator function, generator expressions were compiled objects, but in the context of classes, functions were wrapped in CPython instancemethod objects. The new compiled_method is specifically designed for wrapping compiled_function and therefore more efficient at it.

Bug fixes

  • When using Python or Nuitka.py to execute some script, the exit code in case of "file not found" was not the same as CPython. It should be 2, not 1.
  • The exit code of the created programs (--deep mode) in case of an uncaught exception was 0, now it an error exit with value 1, like CPython does it.
  • Exception tracebacks created inside with statements could contain duplicate lines, this was corrected.

New Optimization

  • Global variable assignments now also use assign0 where no reference exists.

    The assignment code for module variables is actually faster if it needs not drop the reference, but clearly the code shouldn't bother to take it on the outside just for that. This variant existed, but wasn't used as much so far.

  • The instance method objects are now Nuitka's own compiled type too. This should make things slightly faster by itself.

  • Our new compiled method objects support dedicated method parsing code, where self is passed directly, allowing to make calls taking a fast path in parameter parsing.

    This avoids allocating/freeing a tuple object per method call, while reduced 3% ticks in "PyStone" benchmark, so that's significant.

  • Solved a TODO of BUILTIN_RANGE to change it to pre-allocating the list in the final size as we normally do everywhere else. This was a tick reduction of 0.4% in "PyStone" benchmark, but the measurement method normalizes on loop speed, so it's not visible in the numbers output.

  • Parameter variables cannot possibly be uninitialized at creation and most often they are never subject to a del statement. Adding dedicated C++ variable classes gave a big speedup, around 3% of "PyStone" benchmark ticks.

  • Some abstract object operations were re-implemented, which allows to avoid function calls e.g. in the ITERATOR_NEXT case, this gave a few percent on "PyStone" as well.

Cleanups

  • New package nuitka.codegen to contain all code generation related stuff, moved nuitka.templates to nuitka.codegen.templates as part of that.
  • Inside the nuitka.codegen package the MainControl module now longer reaches into Generator for simple things, but goes through CodeGeneration for everything now.
  • The Generator module uses almost no tree nodes anymore, but instead gets information passed in function calls. This allows for a cleanup of the interface towards CodeGeneration. Gives a cleaner view on the C++ code generation, and generally furthers the goal of other than C++ language backends.
  • More "PyLint" work, many of the reported warnings have been addressed, but it's not yet happy.
  • Defaults for yield and return are None and these values are now already added (as constants) during tree building so that no such special cases need to be dealt with in CodeGeneration and future analysis steps.
  • Parameter parsing code has been unified even further, now the whole entry point is generated by one of the function in the new nuitka.codegen.ParameterParsing module.
  • Split variable, exception, built-in helper classes into separate header files.

New Tests

  • The exit codes of CPython execution and Nuitka compiled programs are now compared as well.
  • Errors messages of methods are now covered by the ParameterErrors test as well.

Organizational

  • A new script "benchmark.sh" (now called "run-valgrind.py") script now starts "kcachegrind" to display the valgrind result directly.

    One can now use it to execute a test and inspect valgrind information right away, then improve it. Very useful to discover methods for improvements, test them, then refine some more.

  • The "check-release.sh" script needs to unset NUITKA_EXTRA_OPTIONS or else the reflection test will trip over the changed output paths.

Numbers

python 2.6:

Pystone(1.1) time for 50000 passes = 0.65
This machine benchmarks at 76923.1 pystones/second

Nuitka 0.3.7 (driven by python 2.6):

Pystone(1.1) time for 50000 passes = 0.28
This machine benchmarks at 178571 pystones/second

This is a 132% speed of 0.3.7 compared to CPython, up from 109% compare to the previous release. This is a another small increase, that can be fully attributed to milestone 2 measures, i.e. not analysis, but purely more efficient C++ code generation and the new compiled method type.

One can now safely assume that it is at least twice as fast, but I will try and get the PyPy or Shedskin test suite to run as benchmarks to prove it.

No milestone 3 work in this release. I believe it's best to finish with milestone 2 first, because these are quite universal gains that we should have covered.

Nuitka Release 0.3.6

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

The major point this for this release is cleanup work, and generally bug fixes, esp. in the field of importing. This release cleans up many small open ends of Nuitka, closing quite a bunch of consistency TODO items, and then aims at cleaner structures internally, so optimization analysis shall become "easy". It is a correctness and framework release, not a performance improvement at all.

Bug fixes

  • Imports were not respecting the level yet. Code like this was not working, now it is:

    from .. import something
    
  • Absolute and relative imports were e.g. both tried all the time, now if you specify absolute or relative imports, it will be attempted in the same way than CPython does. This can make a difference with compatibility.

  • Functions with a "locals dict" (using locals built-in or exec statement) were not 100% compatible in the way the locals dictionary was updated, this got fixed. It seems that directly updating a dict is not what CPython does at all, instead it only pushes things to the dictionary, when it believes it has to. Nuitka now does the same thing, making it faster and more compatible at the same time with these kind of corner cases.

  • Nested packages didn't work, they do now. Nuitka itself is now successfully using nested packages (e.g. nuitka.transform.optimizations)

New Features

  • The --lto option becomes usable. It's not measurably faster immediately, and it requires g++ 4.6 to be available, but then it at least creates smaller binaries and may provide more optimization in the future.

New Optimization

  • Exceptions raised by pre-computed built-ins, unpacking, etc. are now transformed to raising the exception statically.

Cleanups

  • There is now a getVariableForClosure that a variable provider can use. Before that it guessed from getVariableForReference or getVariableForAssignment what might be the intention. This makes some corner cases easier.
  • Classes, functions and lambdas now also have separate builder and body nodes, which enabled to make getSameScopeNodes() really simple. Either something has children which are all in a new scope or it has them in the same scope.
  • Twisted workarounds like TransitiveProvider are no longer needed, because class builder and class body were separated.
  • New packages nuitka.transform.optimizations and nuitka.transform.finalizations, where the first was nuitka.optimizations before. There is also code in nuitka.transform that was previously in a dedicated module. This allowed to move a lot of displaced code.
  • TreeBuilding now has fast paths for all 3 forms, things that need a "provider", "node", and "source_ref"; things that need "node" and "source_ref"; things that need nothing at all, e.g. pass.
  • Variables now avoid building duplicated instances, but instead share one. Better for analysis of them.

New Tests

  • The Python 2.7 test suite is no longer run with Python 2.6 as it will just crash with the same exception all the time, there is no importlib in 2.6, but every test is using that through test_support.
  • Nested packages are now covered with tests too.
  • Imports of upper level packages are covered now too.

Organizational

  • Updated the "README.txt" with the current plan on optimization.

Numbers

python 2.6:

Pystone(1.1) time for 50000 passes = 0.65
This machine benchmarks at 76923.1 pystones/second

Nuitka 0.3.6 (driven by python 2.6):

Pystone(1.1) time for 50000 passes = 0.31
This machine benchmarks at 161290 pystones/second

This is 109% for 0.3.6, but no change from the previous release. No surprise, because no new effective new optimization means have been implemented. Stay tuned for future release for actual progress.

Nuitka Release 0.3.5

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This new release of Nuitka is an overall improvement on many fronts, there is no real focus this time, likely due to the long time it was in the making.

The major points are more optimization work, largely enhanced import handling and another improvement on the performance side. But there are also many bug fixes, more test coverage, usability and compatibility.

Something esp. noteworthy to me and valued is that many important changes were performed or at least triggered by Nicolas Dumazet, who contributed a lot of high quality commits as you can see from the gitweb history. He appears to try and compile Mercurial and Nuitka, and this resulted in important contributions.

Bug fixes

  • Nicolas found a reference counting bug with nested parameter calls. Where a function had parameters of the form a, (b,c) it could crash. This got fixed and covered with a reference count test.
  • Another reference count problem when accessing the locals dictionary was corrected.
  • Values 0.0 and -0.0 were treated as the same. They are not though, they have a different sign that should not get lost.
  • Nested contractions didn't work correctly, when the contraction was to iterate over another contraction which needs a closure. The problem was addressing by splitting the building of a contraction from the body of the contraction, so that these are now 2 nodes, making it easy for the closure handling to get things right.
  • Global statements in function with local exec() would still use the value from the locals dictionary. Nuitka is now compatible to CPython with this too.
  • Nicolas fixed problems with modules of the same name inside different packages. We now use the full name including parent package names for code generation and look-ups.
  • The __module__ attribute of classes was only set after the class was created. Now it is already available in the class body.
  • The __doc__ attribute of classes was not set at all. Now it is.
  • The relative import inside nested packages now works correctly. With Nicolas moving all of Nuitka to a package, the compile itself exposed many weaknesses.
  • A local re-raise of an exception didn't have the original line attached but the re-raise statement line.

New Features

  • Modules and packages have been unified. Packages can now also have code in "__init__.py" and then it will be executed when the package is imported.
  • Nicolas added the ability to create deep output directory structures without having to create them beforehand. This makes --output-dir=some/deep/path usable.
  • Parallel build by Scons was added as an option and enabled by default, which enhances scalability for --deep compilations a lot.
  • Nicolas enhanced the CPU count detection used for the parallel build. Turned out that multithreading.cpu_count() doesn't give us the number of available cores, so he contributed code to determine that.
  • Support for upcoming g++ 4.6 has been added. The use of the new option --lto has been been prepared, but right now it appears that the C++ compiler will need more fixes, before we can this feature with Nuitka.
  • The --display-tree feature got an overhaul and now displays the node tree along with the source code. It puts the cursor on the line of the node you selected. Unfortunately I cannot get it to work two-way yet. I will ask for help with this in a separate posting as we can really use a "python-qt" expert it seems.
  • Added meaningful error messages in the "file not found" case. Previously I just didn't care, but we sort of approach end user usability with this.

New Optimization

  • Added optimization for the built-in range() which otherwise requires a module and builtin module lookup, then parameter parsing. Now this is much faster with Nuitka and small ranges (less than 256 values) are converted to constants directly, avoiding run time overhead entirely.
  • Code for re-raise statements now use a simple re-throw of the exception where possible, and only do the hard work where the re-throw is not inside an exception handler.
  • Constant folding of operations and comparisons is now performed if the operands are constants.
  • Values of some built-ins are pre-computed if the operands are constants.
  • The value of module attribute __name__ is replaced by a constant unless it is assigned to. This is the first sign of upcoming constant propagation, even if only a weak one.
  • Conditional statement and/or their branches are eliminated where constant conditions allow it.

Cleanups

  • Nicolas moved the Nuitka source code to its own nuitka package. That is going to make packaging it a lot easier and allows cleaner code.

  • Nicolas introduced a fast path in the tree building which often delegates (or should do that) to a function. This reduced a lot of the dispatching code and highlights more clearly where such is missing right now.

  • Together we worked on the line length issues of Nuitka. We agreed on a style and very long lines will vanish from Nuitka with time. Thanks for pushing me there.

  • Nicolas also did provide many style fixes and general improvements, e.g. using PyObjectTemporary in more places in the C++ code, or not using str.find where x in y is a better choice.

  • The node structure got cleaned up towards the direction that assigments always have an assignment as a child. A function definition, or a class definition, are effectively assignments, and in order to not have to treat this as special cases everywhere, they need to have assignment targets as child nodes.

    Without such changes, optimization will have to take too many things into account. This is not yet completed.

  • Nicolas merged some node tree building functions that previously handled deletion and assigning differently, giving us better code reuse.

  • The constants code generation was moved to a __constants.cpp where it doesn't make __main__.cpp so much harder to read anymore.

  • The module declarations have been moved to their own header files.

  • Nicolas cleaned up the scripts used to test Nuitka big time, removing repetitive code and improving the logic. Very much appreciated.

  • Nicolas also documented a things in the Nuitka source code or got me to document things that looked strange, but have reasons behind it.

  • Nicolas solved the TODO related to built-in module accesses. These will now be way faster than before.

  • Nicolas also solved the TODO related to the performance of "locals dict" variable accesses.

  • Generator.py no longer contains classes. The Contexts objects are supposed to contain the state, and as such the generator objects never made much sense.

  • Also with the help of Scons community, I figured out how to avoid having object files inside the src directory of Nuitka. That should also help packaging, now all build products go to the .build directory as they should.

  • The vertical white space of the generated C++ got a few cleanups, trailing/leading new line is more consistent now, and there were some assertions added that it doesn't happen.

New Tests

  • The CPython 2.6 tests are now also run by CPython 2.7 and the other way around and need to report the same test failure reports, which found a couple of issues.
  • Now the test suite is run with and without --debug mode.
  • Basic tests got extended to cover more topics and catch more issues.
  • Program tests got extended to cover code in packages.
  • Added more exec scope tests. Currently inlining of exec statements is disabled though, because it requires entirely different rules to be done right, it has been pushed back to the next release.

Organizational

  • The g++-nuitka script is no more. With the help of the Scons community, this is now performed inside the scons and only once instead of each time for every C++ file.
  • When using --debug, the generated C++ is compiled with -Wall and -Werror so that some form of bugs in the generated C++ code will be detected immediately. This found a few issues already.
  • There is a new git merge policy in place. Basically it says, that if you submit me a pull request, that I will deal with it before publishing anything new, so you can rely on the current git to provide you a good base to work on. I am doing more frequent pre-releases already and I would like to merge from your git.
  • The "README.txt" was updated to reflect current optimization status and plans. There is still a lot to do before constant propagation can work, but this explains things a bit better now. I hope to expand this more and more with time.
  • There is now a "misc/clean-up.sh" script that prints the commands to erase all the temporary files sticking around in the source tree. That is for you if you like me, have other directories inside, ignored, that you don't want to delete.
  • Then there is now a script that prints all source filenames, so you can more easily open them all in your editor.
  • And very important, there is now a "check-release.sh" script that performs all the tests I think should be done before making a release.
  • Pylint got more happy with the current Nuitka source. In some places, I added comments where rules should be granted exceptions.

Numbers

python 2.6:

Pystone(1.1) time for 50000 passes = 0.65
This machine benchmarks at 76923.1 pystones/second

Nuitka 0.3.5 (driven by python 2.6):

Pystone(1.1) time for 50000 passes = 0.31
This machine benchmarks at 161290 pystones/second

This is 109% for 0.3.5, up from 91% before.

Overall this release is primarily an improvement in the domain of compatibility and contains important bug and feature fixes to the users. The optimization framework only makes a first showing of with the framework to organize them. There is still work to do to migrate optimization previously present

It will take more time before we will see effect from these. I believe that even more cleanups of TreeBuilding, Nodes and CodeGeneration will be required, before everything is in place for the big jump in performance numbers. But still, passing 100% feels good. Time to rejoice.

Python Float Quiz

Quiz Question

Say you have the following code:

assert type(s) is str
x = float(s)
if x != x:
   print "Bad bad float!"

What value of "s" and then "x" can make the code complain? Do you see the really bad side of it?

The answer is in the next paragraph, so stop reading if you want to find out yourself.

Solution

The correct answer is that there is one float that is not equal to itself and that is float("nan"). Which I find terrible. It is so bad, it spoils set, dict, and everything there is. Any container that has it inside is no longer equal to itself.

Surprised? I was too! I only learned it while doing my Python compiler Nuitka and I made it a separate posting, because it really surprised me how this could possibly happen. A builtin type that breaks fundamental assumptions like "x == x".