Nuitka Release 0.5.27

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release comes a lot of bug fixes and improvements.

Bug Fixes

  • Fix, need to add recursed modules immediately to the working set, or else they might first be processed in second pass, where global names that are locally assigned, are optimized to the built-in names although that should not happen. Fixed in 0.5.26.1 already.
  • Fix, the accelerated call of methods could crash for some special types. This had been a regress of 0.5.25, but only happens with custom extension types. Fixed in 0.5.26.1 already.
  • Python3.5: For async def functions parameter variables could fail to properly work with in-place assignments to them. Fixed in 0.5.26.4 already.
  • Compatability: Decorators that overload type checks didn't pass the checks for compiled types. Now isinstance and as a result inspect module work fine for them.
  • Compatiblity: Fix, imports from __init__ were crashing the compiler. You are not supposed to do them, because they duplicate the package code, but they work.
  • Compatiblity: Fix, the super built-in on module level was crashing the compiler.
  • Standalone: For Linux, BSD and MacOS extension modules and shared libraries using their own $ORIGIN to find loaded DLLs resulted in those not being included in the distribution.
  • Standalone: Added more missing implicit dependencies.
  • Standalone: Fix, implicit imports now also can be optional, as e.g. _tkinter if not installed. Only include those if available.
  • The --recompile-c-only was only working with C compiler as a backend, but not in the C++ compatibility fallback, where files get renamed. This prevented that edit and test debug approach with at least MSVC.
  • Plugins: The PyLint plug-in didn't consider the symbolic name import-error but only the code F0401.
  • Implicit exception raises in conditional expressions would crash the compiler.

New Features

  • Added support for Visual Studio 2017. Issue#368.
  • Added option --python2-for-scons to specify the Python2 execute to use for calling Scons. This should allow using AnaConda Python for that task.

Optimization

  • References to known unassigned variables are now statically optimized to exception raises and warned about if the according option is enabled.
  • Unhashable keys in dictionaries are now statically optimized to exception raises and warned about if the according option is enabled.
  • Enable forward propagation for classes too, resulting in some classes to create only static dictionaries. Currently this never happens for Python3, but it will, once we can statically optimize __prepare__ too.
  • Enable inlining of class dictionary creations if they are mere return statements of the created dictionary. Currently this never happens for Python3, see above for why.
  • Python2: Selecting the metaclass is now visible in the tree and can be statically optimized.
  • For executables, we now also use a freelist for traceback objects, which also makes exception cases slightly faster.
  • Generator expressions no longer require the use of a function call with a .0 argument value to carry the iterator value, instead their creation is directly inlined.
  • Remove "pass through" frames for Python2 list contractions, they are no longer needed. Minimal gain for generated code, but more lightweight at compile time.
  • When compiling Windows x64 with MinGW64 a link library needs to be created for linking against the Python DLL. This one is now cached and re-used if already done.
  • Use common code for NameError and UnboundLocalError exception code raises. In some cases it was creating the full string at compile time, in others at run time. Since the later is more efficient in terms of code size, we now use that everywhere, saving a bit of binary size.
  • Make sure to release unused functions from a module. This saves memory and can be decided after a full pass.
  • Avoid using OrderedDict in a couple of places, where they are not needed, but can be replaced with a later sorting, e.g. temporary variables by name, to achieve deterministic output. This saves memory at compile time.
  • Add specialized return nodes for the most frequent constant values, which are None, True, and False. Also a general one, for constant value return, which avoids the constant references. This saves quite a bit of memory and makes traversal of the tree a lot faster, due to not having any child nodes for the new forms of return statements.
  • Previously the empty dictionary constant reference was specialized to save memory. Now we also specialize empty set, list, and tuple constants to the same end. Also the hack to make is not say that {} is {} was made more general, mutable constant references and now known to never alias.
  • The source references can be marked internal, which means that they should never be visible to the user, but that was tracked as a flag to each of the many source references attached to each node in the tree. Making a special class for internal references avoids storing this in the object, but instead it's now a class property.
  • The nodes for named variable reference, assignment, and deletion got split into separate nodes, one to be used before the actual variable can be determined during tree building, and one for use later on. This makes their API clearer and saves a tiny bit of memory at compile time.
  • Also eliminated target variable references, which were pseudo children of assignments and deletion nodes for variable names, that didn't really do much, but consume processing time and memory.
  • Added optimization for calls to staticmethod and classmethod built-in methods along with type shapes.
  • Added optimization for open built-in on Python3, also adding the type shape file for the result.
  • Added optimization for bytearray built-in and constant values. These mutable constants can now be compile time computed as well.
  • Added optimization for frozenset built-in and constant values. These mutable constants can now be compile time computed as well.
  • Added optimization for divmod built-in.
  • Treat all built-in constant types, e.g. type itself as a constant. So far we did this only for constant values types, but of course this applies to all types, giving slightly more compact code for their uses.
  • Detect static raises if iterating over non-iterables and warn about them if the option is enabled.
  • Split of locals node into different types, one which needs the updated value, and one which just makes a copy. Properly track if a functions needs an updated locals dict, and if it doesn't, don't use that. This gives more efficient code for Python2 classes, and exec using functions in Python2.
  • Build all constant values without use of the pickle module which has a lot more overhead than marshal, instead use that for too large long values, non-UTF8 unicode values, nan float, etc.
  • Detect the linker arch for all Linux platforms using objdump instead of only a hand few hard coded ones.

Cleanups

  • The use of INCREASE_REFCOUNT got fully eliminated.
  • Use functions not vulenerable for buffer overflow. This is generally good and avoids warnings given on OpenBSD during linking.
  • Variable closure for classes is different from all functions, don't handle the difference in the base class, but for class nodes only.
  • Make sure mayBeNon doesn't return None which means normally "unclear", but False instead, since it's always clear for those cases.
  • Comparison nodes were using the general comparison node as a base class, but now a proper base class was added instead, allowing for cleaner code.
  • Valgrind test runners got changed to using proper tool namespace for their code and share it.
  • Made construct case generation code common testing code for re-use in the speedcenter web site. The code also has minor beauty bugs which will then become fixable.
  • Use appdirs package to determine place to store the downloaded copy of depends.exe.
  • The code still mentioned C++ in a lot of places, in comments or identifiers, which might be confusing readers of the code.
  • Code objects now carry all information necessary for their creation, and no longer need to access their parent to determine flag values. That parent is subject to change in the future.
  • Our import sorting wrapper automatically detects imports that could be local and makes them so, removing a few existing ones and preventing further ones on the future.
  • Cleanups and annotations to become Python3 PyLint clean as well. This found e.g. that source code references only had __cmp__ and need rich comparison to be fully portable.

Tests

  • The test runner for construct tests got cleaned up and the constructs now avoid using xrange so as to not need conversion for Python3 execution as much.
  • The main test runner got cleaned up and uses common code making it more versatile and robust.
  • Do not run test in debugger if CPython also segfaulted executing the test, then it's not a Nuitka issue, so we can ignore that.
  • Improve the way the Python to test with is found in the main test runner, prefer the running interpreter, then PATH and registry on Windows, this will find the interesting version more often.
  • Added support for "Landscape.io" to ignore the inline copies of code, they are not under our control.
  • The test runner for Valgrind got merged with the usage for constructs and uses common code now.
  • Construct generation is now common code, intended for sharing it with the Speedcenter web site generation.
  • Rebased Python 3.6 test suite to 3.6.1 as that is the Python generally used now.

Organizational

  • Added inline copy of appdirs package from PyPI.
  • Added credits for RedBaron and isort.
  • The --experimental flag is now creating a list of indications and more than one can be used that way.
  • The PyLint runner can also work with Python3 pylint.
  • The Nuitka Speedcenter got more fine tuning and produces more tags to more easily identify trends in results. This needs to become more visible though.
  • The MSI files are also built on AppVeyor, where their building will not depend on me booting Windows. Getting these artifacts as downloads will be the next step.

Summary

This release improves many areas. The variable closure taking is now fully transparent due to different node types, the memory usage dropped again, a few obvious missing static optimizations were added, and many built-ins were completed.

This release again improves the scalability of Nuitka, which again uses less memory than before, although not an as big jump as before.

This does not extend or use special C code generation for bool or any type yet, which still needs design decisions to proceed and will come in a later release.

Nuitka Release 0.5.26

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release comes after a long time and contains large amounts of changes in all areas. The driving goal was to prepare generating C specific code, which is still not the case, but this is very likely going to change soon. However this release improves all aspects.

Bug Fixes

  • Compatibility: Fix, for star imports didn't check the values from the __all__ iterable, if they were string values which could cause problems at run time.

    # Module level
    __all__ = (1,)
    
    # ...
    # other module:
    from module import *
    
  • Fix, for star imports, also didn't check for values from __all__ if they actually exist in the original values.

  • Corner cases of imports should work a lot more precise, as the level of compatibility for calls to __import__ went from absurd to insane.

  • Windows: Fixed detection of uninstalled Python versions (not for all users and DLL is not in system directory). This of course only affected the accelerated mode, not standalone mode.

  • Windows: Scan directories for .pyd files for used DLLs as well. This should make the PyQt5 wheel work.

  • Python3.5: Fix, coroutines could have different code objects for the object and the frame using by it.

  • Fix, slices with built-in names crashed the compiler.

    something[id:len:range]
    
  • Fix, the C11 via C++ compatibility uses symlinks tp C++ filenames where possible instead of making a copy from the C source. However, even on Linux that may not be allowed, e.g. on a DOS file system. Added fallback to using full copy in that case. Issue#353.

  • Python3.5: Fix coroutines to close the "yield from" where an exception is thrown into them.

  • Python3: Fix, list contractions should have their own frame too.

  • Linux: Copy the "rpath" of compiling Python binary to the created binary. This will make compiled binaries using uninstalled Python versions transparently find the Python shared library.

  • Standalone: Add the "rpath" of the compiling Python binary to the search path when checking for DLL dependencies on Linux. This fixes standalone support for Travis and Anaconda on Linux.

  • Scons: When calling scons, also try to locate a Python2 binary to overcome a potential Python3 virtualenv in which Nuitka is running.

  • Standalone: Ignore more Windows only encodings on non-Windows.

New Features

  • Support for Python 3.6 with only few corner cases not supported yet.
  • Added options --python-arch to pick 32 or 64 bits Python target of the --python-version argument.
  • Added support for more kinds of virtualenv configurations.
  • Uninstalled Python versions such as Anaconda will work fine in accelerated mode, except on Windows.

Optimization

  • The node tree children are no longer stored in a separate dictionary, but in the instance dictionary as attributes, making the tree more lightweight and in principle faster to access. This also saved about 6% of the memory usage.

  • The memory usage of Nuitka for the Python part has fallen by roughly 40% due to the use of new style classes, and slots where that is possible (some classes use multiple inheritance, where they don't work), and generally by reducing useless members e.g. in source code references. This of course also will make things compiled faster (the C compilation of course is not affected by this.)

  • The code generation for frames was creating the dictionary for the raised exception by making a dictionary and then adding all variables, each tested to be set. This was a lot of code for each frame specific, and has been replaced by a generic "attach" mechanism which merely stores the values, and only takes a reference. When asked for frame locals, it only then builds the dictionary. So this is now only done, when that is absolutely necessary, which it normally never is. This of course makes the C code much less verbose, and actual handling of exceptions much more efficient.

  • For imports, we now detect for built-in modules, that their import cannot fail, and if name lookups can fail. This leads to less code generated for error handling of these. The following code now e.g. fully detects that no ImportError or AttributeError will occur.

    try:
        from __builtin__ import len
    except ImportError:
        from builtins import len
    
  • Added more type shapes for built-in type calls. These will improve type tracing.

  • Compiled frames now have a free list mechanism that should speed up frames that recurse and frames that exit with exceptions. In case of an exception, the frame ownership is immediately transferred to the exception making it easier to deal with.

  • The free list implementations have been merged into a new common one that can be used via macro expansion. It is now type agnostic and be slightly more efficient too.

  • Also optimize "true" division and "floor division", not only the default division of Python2.

  • Removed the need for statement context during code generation making it less memory intensive and faster.

Cleanups

  • Now always uses the __import__ built-in node for all kinds of imports and directly optimizes and recursion into other modules based on that kind of node, instead of a static variant. This removes duplication and some incompatability regarding defaults usage when doing the actual imports at run time.
  • Split the expression node bases and mixin classes to a dedicated module, moving methods that only belong to expressions outside of the node base, making for a cleaner class hierachy.
  • Cleaned up the class structure of nodes, added base classes for typical compositions, e.g. expression with and without children, computation based on built-in, etc. while also checking proper ordering of base classes in the metaclass.
  • Moved directory and file operations to dedicated module, making also sure it is more generally used. This makes it easier to make more error resilient deletions of directories on e.g. Windows, where locks tend to live for short times beyond program ends, requiring second attempts.
  • Code generation for existing supported types, PyObject *, PyObject **, and struct Nuitka_CellObject * is now done via a C type class hierachy instead of elif sequences.
  • Closure taking is now always done immediately correctly and references are take for closure variables still needed, making sure the tree is correct and needs no finalization.
  • When doing variable traces, initialize more traces immediately so it can be more reliable.
  • Code to setup a function for local variables and clean it up has been made common code instead of many similar copies.
  • The code was treating the f_executing frame member as if it were a counter with increases and decreases. Turn it into a mere boolean value and hide its usage behind helper functions.
  • The "maybe local variables" are no more. They were replaced by a new locals dict access node with a fallback to a module or closure variable should the dictionary not contain the name. This avoids many ugly checks to not do certain things for that kind of variable.
  • We now detect "exec" and "unqualified exec" as well as "star import" ahead of time as flags of the function to be created. We no longer need to mark functions as we go.
  • Handle "true", "floor" and normal division properly by applying future flags to decide which one to use.
  • We now use symbolic identifiers in all PyLint annotations.
  • The release scripts started to move into nuitka.tools.release so they get PyLint checks, autoformat and proper code re-use.
  • The use of INCREASE_REFCOUNT_X was removed, it got replaced with proper Py_XINCREF usages.
  • The use of INCREASE_REFCOUNT got reduced further, e.g. no generated code uses it anymore, and only a few compiled types do. The function was once required before "C-ish" lifted the need to do everything in one single function call.

Tests

  • More robust deletion of directories, temporary stages used by CPython test suites, and standalone directories during test execution.
  • Moved tests common code into nuitka.tools.testing namespace and use it from there. The code now is allowed to use nuitka.utils and therefore often better implementations.
  • Made standalone binaries robust against GTK theme access, checking the Python binary (some site.py files do that),

Organizational

  • Added repository for Ubuntu Zesty (17.04) for download.
  • Added support for testing with Travis to complement the internal Buildbot based infrastructure and have pull requests on Github automatically tested before merge.
  • The factory branch is now also on Github.
  • Removed MSI for Python3.4 32 bits. It seems impossible to co-install this one with the 64 bits variant. All other versions are provided for both bit sizes still.

Summary

This release marks huge progress. The node tree is now absolutely clean, the variable closure taking is fully represented, and code generation is prepared to add another type, e.g. for bool for which work has already started.

On a practical level, the scalability of the release will have increased very much, as this uses so much less memory, generates simpler C code, while at the same time getting faster for the exception cases.

Coming releases will expand on the work of this release.

Frame objects should be allowed to be nested inside a function for better re-formulations of classes and contractions of all kinds, as well as real inline of functions, even if they could raise.

The memory savings could be even larger, if we stopped doing multiple inheritance for more node types. The __slots__ were and the child API change could potentially make things not only more compact, but faster to use too.

And also once special C code generation for bool is done, it will set the stage for more types to follow (int, float, etc). Only this will finally start to give the C type speed we are looking for.

Until then, this release marks a huge cleanup and progress to what we already had, as well as preparing the big jump in speed.

Nuitka Release 0.5.25

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains a huge amount of bug fixes, lots of optimization gains, and many new features. It also presents many organizational improvements, and many cleanups.

Bug Fixes

  • Python3.5: Coroutine methods using super were crashing the compiler. Issue#340. Fixed in 0.5.24.2 already.

  • Python3.3: Generator return values were not properly transmitted in case of tuple or StopIteration values.

  • Python3.5: Better interoperability between compiled coroutines and uncompiled generator coroutines.

  • Python3.5: Added support to compile in Python debug mode under Windows too.

  • Generators with arguments were using two code objects, one with, and one without the CO_NOFREE flag, one for the generator object creating function, and one for the generator object.

  • Python3.5: The duplicate code objects for generators with arguments lead to interoperability issues with between such compiled generator coroutines and compiled coroutines. Issue#341. Fixed in 0.5.24.2 already.

  • Standalone: On some Linux variants, e.g. Debian Stretch and Gentoo, the linker needs more flags to really compile to a binary with RPATH.

  • Compatibility: For set literal values, insertion order is wrong on some versions of Python, we now detect the bug and emulate it if necessary, previous Nuitka was always correct, but incompatible.

    {1, 1.0}.pop() # the only element of the set should be 1
    
  • Windows: Make the batch files detect where they live at run time, instead of during setup.py, making it possible to use them for all cases.

  • Standalone: Added package paths to DLL scan for depends.exe, as with wheels there now sometimes live important DLLs too.

  • Fix, the clang mode was regressed and didn't work anymore, breaking the MacOS support entirely.

  • Compatibility: For imports, we were passing for locals argument a real dictionary with actual values. That is not what CPython does, so stopped doing it.

  • Fix, for raised exceptions not passing the validity tests, they could be used after free, causing crashes.

  • Fix, the environment CC wasn't working unless also specifying CXX.

  • Windows: The value of __file__ in module mode was wrong, and didn't point to the compiled module.

  • Windows: Better support for --python-debug for installations that have both variants, it is now possible to switch to the right variant.

New Features

  • Added parsing for shebang to Nuitka. When compiling an executable, now Nuitka will check of the #! portion indicates a different Python version and ask the user to clarify with --python-version in case of a mismatch.
  • Added support for Python flag -O, which allows to disable assertions and remove doc strings.

Optimization

  • Faster method calls, combining attribute lookup and method call into one, where order of evaluation with arguments doesn't matter. This gives really huge relative speedups for method calls with no arguments.
  • Faster attribute lookup in general for object descendants, which is all new style classes, and all built-in types.
  • Added dedicated xrange built-in implementation for Python2 and range for Python3. This makes those faster while also solving ordering problems when creating constants of these types.
  • Faster sum again, using quick iteration interface and specialized quick iteration code for typical standard type containers, tuple and list.
  • Compiled generators were making sure StopIteration was set after their iteration, although most users were only going to clear it. Now only the send method, which really needs that does it. This speed up the closing of generators quite a bit.
  • Compiled generators were preparing a throw into non-started compilers, to be checked for immediately after their start. This is now handled in a generic way for all generators, saving code and execution time in the normal case.
  • Compiled generators were applying checks only useful for manual send calls even during iteration, slowing them down.
  • Compiled generators could duplicate code objects due to handling a flag for closure variables differently.
  • For compiled frames, the f_trace is not writable, but was taking and releasing references to what must be None, which is not useful.
  • Not passing locals to import calls make it less code and faster too.

Organizational

  • This release also prepares Python 3.6 support, it includes full language support on the level of CPython 3.6.0 with the sole exception of the new generator coroutines.
  • The improved mode is now the default, and full compatibility is now the option, used by test suites. For syntax errors, improved mode is always used, and for test suites, now only the error message is compared, but not call stack or caret positioning anymore.
  • Removed long deprecated option "--no-optimization". Code generation too frequently depends on not seeing unoptimized code. This has been hidden and broken long enough to finally remove it.
  • Added support for Python3.5 numbers to Speedcenter. There are now also tags for speedcenter, indicating how well "develop" branch fares in comparison to master.
  • With a new tool, source code and developer manual contents can be kept in sync, so that descriptions can be quoted there. Eventually a full Sphinx documentation might become available, but for now this makes it workable.
  • Added repository for Ubuntu Yakkety (16.10) for download.
  • Added repository for Fedora 25 for download.

Cleanups

  • Moved the tools to compare CPython output, to sort import statements (isort) to autoformat the source code (Redbaron usage), and to check with PyLint to a common new nuitka.tools package, runnable with __main__ modules and dedicated runners in bin directory.
  • The tools now share code to find source files, or have it for the first time, and other things, e.g. finding needed binaries on Windows installations.
  • No longer patch traceback objects dealloc function. Should not be needed anymore, and most probably was only bug hiding.
  • Moved handling of ast nodes related to import handling to the proper reformulation module.
  • Moved statement generation code to helpers module, making it accessible without cyclic dependencies that require local imports.
  • Removed deprecated method for getting constant code objects in favor of the new way of doing it. Both methods were still used, making it harder to analyse.
  • Removed useless temporary variable initializations from complex call helper internal functions. They worked around code generation issues that have long been solved.
  • The ABI flags are no longer passed to Scons together with the version.

Tests

  • Windows: Added support to detect and to switch debug Python where available to also be able to execute reference counting tests.
  • Added the CPython 3.3 test suite, after cleaning up the worst bits of it, and added the brandnew 3.6 test suite with a minimal set of changes.
  • Use the original 3.4 test suite instead of the one that comes from Debian as it has patched quite a few issues that never made it upstream, and might cause crashes.
  • More construct tests, making a difference between old style classes, which have instances and new style classes, with their objects.
  • It is now possible to run a test program with Python3 and Valgrind.

Summary

The quick iteration is a precursor to generally faster iteration over unknown object iterables. Expanding this to general code generation, and not just the sum built-in, might yield significant gains for normal code in the future, once we do code generation based on type inference.

The faster method calls complete work that was already prepared in this domain and also will be expanded to more types than compiled functions. More work will be needed to round this up.

Adding support for 3.6.0 in the early stages of its release, made sure we pretty much have support for it ready right after release. This is always a huge amount of work, and it's good to catch up.

This release is again a significant improvement in performance, and is very important to clean up open ends. Now the focus of coming releases will now be on both structural optimization, e.g. taking advantage of the iterator tracing, and specialized code generation, e.g. for those iterations really necessary to use quick iteration code.

Nuitka Release 0.5.24

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is again focusing on optimization, this time very heavily on the generator performance, which was found to be much slower than CPython for some cases. Also there is the usual compatibility work and improvements for Pure C support.

Bug Fixes

  • Windows: The 3.5.2 coroutine new protocol implementation was using the wrapper from CPython, but it's not part of the ABI on Windows. Have our own instead. Fixed in 0.5.23.1 already.
  • Windows: Fixed second compilation with MSVC failing. The files renamed to be C++ files already existed, crashing the compilation. Fixed in 0.5.23.1 already.
  • Mac OS: Fixed creating extension modules with .so suffix. This is now properly determined by looking at the importer details, leading to correct suffix on all platforms. Fixed in 0.5.23.1 already.
  • Debian: Don't depend on a C++ compiler primarily anymore, the C compiler from GNU or clang will do too. Fixed in 0.5.23.1 already.
  • Pure C: Adapted scons compiler detecting to properly consider C11 compilers from the environment, and more gracefully report things.

Optimization

  • Python2: Generators were saving and restoring exceptions, updating the variables sys.exc_type for every context switch, making it really slow, as these are 3 dictionary updates, normally not needed. Now it's only doing it if it means a change.
  • Sped up creating generators, coroutines and coroutines by attaching the closure variable storage directly to the object, using one variable size allocation, instead of two, once of which was a standard malloc. This makes creating them easier and avoids maintaining the closure pointer entirely.
  • Using dedicated compiled cell implementation similar to PyCellObject but fully under our control. This allowed for smaller code generated, while still giving a slight performance improvement.
  • Added free list implementation to cache generator, coroutines, and function objects, avoiding the need to create and delete this kind of objects in a loop.
  • Added support for the built-in sum, making slight optimizations to be much faster when iterating over lists and tuples, as well as fast long sum for Python2, and much faster bool sums too. This is using a prototype version of a "qiter" concept.
  • Provide type shape for xrange calls that are not constant too, allowing for better optimization related to those.

Tests

  • Added workarounds for locks being held by Virus Scanners on Windows to our test runner.
  • Enhanced constructs that test generator expressions to more clearly show the actual construct cost.
  • Added construct tests for the sum built-in on verious types of int containers, making sure we can do all of those really fast.

Summary

This release improves very heavily on generators in Nuitka. The memory allocator is used more cleverly, and free lists all around save a lot of interactions with it. More work lies ahead in this field, as these are not yet as fast as they should be. However, at least Nuitka should be faster than CPython for these kind of usages now.

Also, proper pure C in the Scons is relatively important to cover more of the rarer use cases, where the C compiler is too old.

The most important part is actually how sum optimization is staging a new kind of approach for code generation. This could become the standard code for iterators in loops eventually, making for loops even faster. This will be for future releases to expand.

Nuitka Release 0.5.23

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is focusing on optimization, the most significant part for the users being enhanced scalability due to memory usage, but also break through structural improvements for static analysis of iterators and the debut of type shapes and value shapes, giving way to "shape tracing".

Bug Fixes

  • Fix support Python 3.5.2 coroutine changes. The checks got added for improved mode for older 3.5.x, the new protocol is only supported when run with that version or higher.

  • Fix, was falsely optimizing away unused iterations for non-iterable compile time constants.

    iter(1) # needs to raise.
    
  • Python3: Fix, eval must not attempt to strip memoryviews. The was preventing it from working with that type.

  • Fix, calling type without any arguments was crashing the compiler. Also the exception raised for anything but 1 or 3 arguments was claiming that only 3 arguments were allowed, which is not the compatible thing.

  • Python3.5: Fix, follow enhanced error checking for complex call handling of star arguments.

  • Compatibility: The from x import x, y re-formulation was doing two __import__ calls instead of re-using the module value.

Optimization

  • Uses only about 66% of the memory compared to last release, which is very important step for scalability independent of re-loading. This was achieved by making sure to break loop traces and their reference cycle when they become unused.

  • Properly detect the len of multiplications at compile time from newly introduces value shapes, so that this is e.g. statically optimized.

    print(len("*" * 10000000000))
    
  • Due to newly introduced type shapes, len and iter now properly detect more often if values will raise or not, and warn about detected raises.

    iter(len((something)) # Will always raise
    
  • Due to newly introduced "iterator tracing", we can now properly detect if the length of an unpacking matches its source or not. This allows to remove the check of the generic re-formulations of unpackings at compile time.

    a, b = b, a    # Will never raise due to unpacking
    a, b = b, a, c # Will always raise, 3 items cannot unpack to 2
    
  • Added support for optimization of the xrange built-in for Python2.

  • Python2: Added support for xrange iterable constant values, pre-building those constants ahead of time.

  • Python3: Added support and range iterable constant values, pre-building those constants ahead of time. This brings optimization support for Python3 ranges to what was available for Python2 already.

  • Avoid having a special node variange for range with no arguments, but create the exception raising node directly.

  • Specialized constant value nodes are using less generic implementations to query e.g. their length or iteration capabilities, which should speed up many checks on them.

  • Added support for the format built-in.

  • Python3: Added support for the ascii built-in.

Organizational

  • The movement to pure C got the final big push. All C++ only idoms of C++ were removed, and everything works with C11 compilers. A C++03 compiler can be used as a fallback, in case of MSVC or too old gcc for instance.
  • Using pure C, MinGW64 6x is now working properly. The latest version had problems with hypot related changes in the C++ standard library. Using C11 solves that.
  • This release also prepares Python 3.6 support, it includes full language support on the level of CPython 3.6.0b1.
  • The CPython 3.6 test suite was run with Python 3.5 to ensure bug level compatibility, and had a few findings of incompatibilities.

Cleanups

  • The last holdouts of classes in Nuitka were removed, and many idioms of C++ were stopped using.
  • Moved range related helper functions to a dedicated include file.
  • Using str is not bytes to detect Python3 str handling or actual bytes type existence.
  • Trace collections were using a mix-in that was merged with the base class that every user of it was having.

Tests

  • Added more static optimization tests, a lot more has become feasible to decide at run time, and is now done. These are to detect regressions in that domain.
  • The CPython 3.6 test suite is now also run with CPython 3.5 which found some incompatibilities.

Summary

This release marks a huge step forward. We are having the structure for type inference now. This will expand in coming releases to cover more cases, and there are many low hanging fruits for optimization. Specialized codes for variable versions of certain known shapes seems feasible now.

Then there is also the move towards pure C. This will make the backend compilation lighter, but due to using C11, we will not suffer any loss of convinience compared to "C-ish". The plan is to use continue to use C++ for compilation for compilers not capable of supporting C11.

The amount of static analysis done in Nuitka is now going to quickly expand, with more and more constructs predicted to raise errors or simplified. This will be an ongoing activity, as many types of expressions need to be enhanced, and only one missing will not let it optimize as well.

Also, it seems about time to add dedicated code for specific types to be as fast as C code. This opens up vast possibilities for acceleration and will lead us to zero overhead C bindings eventually. But initially the drive is towards enhanced import analysis, to become able to know the precide module expected to be imported, and derive type information from this.

The coming work will attack to start whole program optimization, as well as enhanced local value shape analysis, as well specialized type code generation, which will make Nuitka improve speed.

Nuitka Release 0.5.22

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is mostly an intermediate release on the way to the large goal of having per module compilation that is cachable and requires far less memory for large programs. This is currently in progress, but required many changes that are in this release, more will be needed.

It also contains a bunch of bug fixes and enhancements that are worth to be released, and the next changes are going to be more invasive.

Bug Fixes

  • Compatibility: Classes with decorated __new__ functions could miss out on the staticmethod decorator that is implicit. It's now applied always, unless of course it's already done manually. This corrects an issue found with Pandas. Fixed in 0.5.22.1 already.
  • Standalone: For at least Python 3.4 or higher, it could happen that the locale needed was not importable. Fixed in 0.5.22.1 already.
  • Compatibility: Do not falsely assume that not expressions cannot raise on boolean expressions, since those arguments might raise during creation. This could lead to wrong optimization. Fixed in 0.5.22.2 already.
  • Standalone: Do not include system specific C libraries in the distribution created. This would lead to problems for some configurations on Linux in cases the glibc is no longer compatible with newer oder older kernels. Fixed in 0.5.22.2 already.
  • The --recurse-directory option didn't check with decision mechanisms for module inclusion, making it impossible to avoid some things.

Optimization

  • Introduced specialized constant classes for empty dictionaries and other special constants, e.g. "True" and "False", so that they can have more hard coded properties and save memory by sharing constant values.
  • The "technical" sharing of a variable is only consider for variables that had some sharing going in the first place, speeing things up quite a bit for that still critical check.
  • Memory savings coming from enhanced trace storage are already visible at about 1%. That is not as much as the reloading will mean, but still helpful to use less overall.

Cleanups

  • The global variable registry was removed. It was in the way of unloading and reloading modules easily. Instead variables are now attached to their owner and referenced by other users. When they are released, these variables are released.
  • Global variable traces were removed. Instead each variable has a list of the traces attached to it. For non-shared variables, this allows to sooner tell attributes of those variables, allowing for sooner optimization of them.
  • No longer trace all initial users of a variable, just merely if there were such and if it constitutes sharing syntactically too. Not only does this save memory, it avoids useless references of the variable to functions that stop using it due to optimization.
  • Create constant nodes via a factory function to avoid non-special instances where variants exist that would be faster to use.
  • Moved the C string functions to a proper nuitka.utils.CStrings package as we use it for better code names of functions and modules.
  • Made functions and explicit child node of modules, which makes their use more generic, esp. for re-loading modules.
  • Have a dedicated function for building frame nodes, making it easier to see where they are created.

Summary

This release is the result of a couple of months work, and somwhat means that proper re-loading of cached results is becoming in sight. The reloading of modules still fails for some things, and more changes will be needed, but with that out of the way, Nuitka's footprint is about to drop and making it then absolutely scalable. Something considered very important before starting to trace more information about values.

This next thing big ought to be one thing that structurally holds Nuitka back from generating C level performance code with say integer operations.

Nuitka Release 0.5.21

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release focused on scalability work. Making Nuitka more usable in the common case, and covering more standalone use cases.

Bug Fixes

  • Windows: Support for newer MinGW64 was broken by a workaround for older MinGW64 versions.
  • Compatibility: Added support for the (inofficial) C-Python API Py_GetArgcArgv that was causing prctl module to fail loading on ARM platforms.
  • Compatibility: The proper error message template for complex call arguments is now detected as compile time. There are changes comming, that are already in some pre-releases of CPython.
  • Standalone: Wasn't properly ignoring Tools and other directories in the standard library.

New Features

  • Windows: Detect the MinGW compiler arch and compare it to the Python arch. In case of a mismatch, the compiler is not used. Otherwise compilation or linking gives hard to understand errors. This also rules out MinGW32 as a compiler that can be used, as its arch doesn't match MinGW64 32 bits variant.
  • Compile modules in two passes with the option to specify which modules will be considered for a second pass at all (compiled without program optimization) or even become bytecode.
  • The developer mode installation of Nuitka in develop mode with the command pip install -e nuitka_git_checkout_dir is now supported too.

Optimization

  • Popular modules known to not be performance relevant are no longer C compiled, e.g. numpy.distutils and many others frequently imported (from some other module), but mostly not used and definitely not performance relevant.

Cleanups

  • The progress tracing and the memory tracing and now more clearly separate and therefore more readable.
  • Moved RPM related files to new rpm directory.
  • Moved documentation related files to doc directory.
  • Converted import sorting helper script to Python and made it run fast.

Organizational

  • The Buildbot infrastructure for Nuitka was updated to Buildbot 0.8.12 and is now maintained up to date with Ansible.
  • Upgraded the Nuitka bug tracker to Roundup 1.5.1 to which I had previously contributed security fixes already active.
  • Added SSL certificates from Let's Encrypt for the web server.

Summary

This release advances the scalability of Nuitka somewhat. The two pass approach does not yet carry all possible fruits. Caching of single pass compiled modules should follow for it to become consistently fast.

More work will be needed to achieve fast and scalable compilation, and that is going to remain the focus for some time.

Nuitka Release 0.5.20

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is mostly about catching up with issues. Most address standalone problems with special modules, but there are also some general compatibility corrections, as well as important fixes for Python3.5 and coroutines and to improve compatibility with special Python variants like AnaConda under the Windows system.

Bug Fixes

  • Standalone Python3.5: The _decimal module at least is using a __name__ that doesn't match the name at load time, causing programs that use it to crash.
  • Compatibility: For Python3.3 the __loader__ attribute is now set in all cases, and it needs to have a __module__ attribute. This makes inspection as done by e.g. flask working.
  • Standalone: Added missing hidden dependencies for Tkinter module, adding support for this to work properly.
  • Windows: Detecting the Python DLL and EXE used at compile time and preserving this information use during backend compilation. This should make sure we use the proper ones, and avoids hacks for specific Python variants, enhancing the support for AnaConda, WinPython, and CPython installations.
  • Windows: The --python-debug flag now properly detects if the run time is supporting things and error exits if it's not available. For a CPython3.5 installation, it will switch between debug and non-debug Python binaries and DLLs.
  • Standalone: Added plug-in for the Pwm package to properly combine it into a single file, suitable for distribution.
  • Standalone: Packages from standard library, e.g. xml now have proper __path__ as a list and not as a string value, which breaks code of e.g. PyXML. Issue#183.
  • Standalone: Added missing dependency of twisted.protocols.tls. Issue#288.
  • Python3.5: When finalizing coroutines that were not finished, a corruption of its reference count could happen under some circumstances.
  • Standalone: Added missing DLL dependency of the uuid module at run time, which uses ctypes to load it.

New Features

  • Added support for AnaConda Python on this Linux. Both accelerated and standalone mode work now. Issue#295.
  • Added support for standalone mode on FreeBSD. Issue#294.
  • The plug-in framework was expanded with new features to allow addressing some specific issues.

Cleanups

  • Moved memory related stuff to dedicated utils package nuitka.utils.MemoryUsage as part of an effort to have more topical modules.
  • Plug-ins how have a dedicated module through which the core accesses the API, which was partially cleaned up.
  • No more "early" and "late" import detections for standalone mode. We now scan everything at the start.

Summary

This release focused on expanding plugins. These were then used to enhance the success of standalone compatibility. Eventually this should lead to a finished and documented plug-in API, which will open up the Nuitka core to easier hacks and more user contribution for these topics.

Nuitka Release 0.5.19

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release brings optimization improvements for dictionary using code. This is now lowering subscripts to dictionary accesses where possible and adds new code generation for known dictionary values. Besides this there is the usual range of bug fixes.

Bug Fixes

  • Fix, attribute assignments or deletions where the assigned value or the attribute source was statically raising crashed the compiler.
  • Fix, the order of evaluation during optimization was considered in the wrong order for attribute assignments source and value.
  • Windows: Fix, when g++ is the path, it was not used automatically, but now it is.
  • Windows: Detect the 32 bits variant of MinGW64 too.
  • Python3.4: The finalize of compiled generators could corrupt reference counts for shared generator objects. Fixed in 0.5.18.1 already.
  • Python3.5: The finalize of compiled coroutines could corrupt reference counts for shared generator objects.

Optimization

  • When a variable is known to have dictionary shape (assigned from a constant value, result of dict built-in, or a general dictionary creation), or the branch merge thereof, we lower subscripts from expecting mapping nodes to dictionary specific nodes. These generate more efficient code, and some are then known to not raise an exception.

    def someFunction(a,b):
        value = {a : b}
        value["c"] = 1
        return value
    

    The above function is not yet fully optimized (dictionary key/value tracing is not yet finished), however it at least knows that no exception can raise from assigning value["c"] anymore and creates more efficient code for the typical result = {} functions.

  • The use of "logical" sharing during optimization has been replaced with checks for actual sharing. So closure variables that were written to in dead code no longer inhibit optimization of the then no more shared local variable.

  • Global variable traces are now faster to decide definite writes without need to check traces for this each time.

Cleanups

  • No more using "logical sharing" allowed to remove that function entirely.
  • Using "technical sharing" less often for decisions during optimization and instead rely more often on proper variable registry.
  • Connected variables with their global variable trace statically avoid the need to check in variable registry for it.
  • Removed old and mostly unused "assume unclear locals" indications, we use global variable traces for this now.

Summary

This release aimed at dictionary tracing. As a first step, the value assign is now traced to have a dictionary shape, and this this then used to lower the operations which used to be normal subscript operations to mapping, but now can be more specific.

Making use of the dictionary values knowledge, tracing keys and values is not yet inside the scope, but expected to follow. We got the first signs of type inference here, but to really take advantage, more specific shape tracing will be needed.

Nuitka Progress in 2015

For quite a bit, there have been no status posts, not for lack of news, but a lot has happened indeed. I just seem to post a lot more to the mailing list than I do here. Especially about unfinished stuff, which is essentially for a project like Nuitka everything that's going on.

Like I previously said, I am shy to make public postings about unfinished stuff and that's going to continue. But I am breaking it, to keep you up to date with where Nuitka has been going lately.

And with release focuses, I have been making some actual changes that I think are worth talking about.

SSA (Single State Assignment Form)

The SSA using release has been made last summer. Recent releases have lifted more and more restrictions on where and now it is applied and made sure the internal status is consistent and true. And that trend is going to continue even more.

For shared variables (closure variables and module variables), Nuitka is still too conservative to make optimization. Code does annotate value escapes, but it's not yet trusting it. The next releases will focus on lifting that kind of restriction, and for quality of result, that will mean making a huge jump ahead once that works, so module variables used locally a lot will become even faster to use then and subject to static optimization too.

Function Inlining

When doing my talk to EuroPython 2015, I was demoing it that, and indeed, what a break through. The circumstances under which it is done are still far too limited though. Essentially that ability is there, but will not normally be noticable yet due to other optimization, e.g. functions are most often module variables and not local to the using function.

More code generation improvements will be needed to be able to inline functions that might raise an exception. Also the "cost" of inlining a function is also very much an unsolved issue. It will become the focus again, once the SSA use as indicated above expands to module variables, as then inlining other things than local functions will be possible too.

So there is a lot of things to do for this to really make a difference to your programs. But it's still great to have that part solved so far.

Scalability

Parameter Parsing

Recent releases have replaced some of the oldest code of Nuitka, the one that generated special argument parsing for each function individually, now replaced with generic code, that surprisingly is often even faster, although quick entry points were tough to beat.

That gives the C backend compiler a much easier time. Previously 3 C functions were created per Python level function, two of which could get really big with many arguments, and these are no more.

Variable Error Messages

Something similar was going on with variable error messages. Each had their exception value pre-computed and created at module load time. Most of these are of course unused. This has been replaced with code that generates it on the fly, resulting in a lot less constants code.

Code Objects

And another thing was to look after code objects, of which there often were two for each Python level function. The one used or the frame during run time and the one used in the function object, differered often, sometimes by small things like flags or local variable names.

That of course was just the result of not passing that along, but created cached objects with hopefully the same options, but that not being true.

Resolving that, and sharing the code object used for creation and then the frame is gives less complex C code too.

Optimization

The scalability of Nuitka also depends much on generated code size. With the optimization become more clever, less code is generated, and that trend will continue as more structural optimization are applied.

Every time e.g. an exception is identified to not happen, this removes the corresponding error exits from the C code, which then makes it easier for the C compiler. Also more specialized code as we now have or dictionaries, is often less complex to it.

Compatibility

Important things have happened here. Full compatibility mode is planned to not be the default anymore in upcoming releases, but that will only mean to not be stupid compatible, but to e.g. have more complete error messages than CPython, more correct line numbers, or for version differences, the best Python version behaviour.


The stable release has full support for Python 3.5, including the new async and await functions. So recent releases can pronounce it as fully supported which was quite a feat.

I am not sure, if you can fully appreciate the catch up game needed to play here. CPython clearly implements a lot of features, that I have to emulate too. That's going to repeat for every major release.

The good news is that the function type of Nuitka is now specialized to the generators and classes, and that was a massive cleanup of its core that was due anyway. The generators have no more their own function creation stuff and that has been helpful with a lot of other stuff.

Another focus driven from Python3, is to get ahead with type shape tracing and type inference of dicionary, and value tracing. To fully support Python3 classes, we need to work on something that is a dictionary a-like, and that will only ever be efficient if we have that. Good news is that the next release is making progress there too.

Performance

Graphs and Benchmarks

I also presented this weak point to EuroPython 2015 and my plan on how to resolve it. Unfortunately, nothing really happened here. My plan is still to use what the PyPy people have developed as vmprof.

So that is not progressing, and I could need help with that definitely. Get in contact if you think you can.

Standalone

The standalone mode of Nuitka was pretty good, and continued to improve further, but I don't care much.

Other Stuff

EuroPython 2015

This was a blast. Meeting people who knew Nuitka but not me was a regular occurrence. And many people well appreciate my work. It felt much different than the years before.

I was able to present Nuitka's function in-lining indeed there, and this high goal that I set myself, quite impressed people.

Also I made many new contacts, largely with the scientific community. I hope to find work with data scientists in the coming years. More amd more it looks like my day job should be closer to Nuitka and my expertise in Python.

Funding

Nuitka receives the occasional donation and those make me very happy. As there is no support from organization like the PSF, I am all on my own there.

This year I want to travel to Europython 2016. It would be sweet if aside of my free time it wouldn't also cost me money. So please consider donating some more, as these kind of events are really helpul to Nuitka.

Collaborators

Nuitka is making more and more break through progress. And you can be a part of it. Now.

You can join and should do so now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Videos

There is a Youtube channel of mine with all the videos of Nuitka so far and I have been preparing myself with proper equipment to make Videos of Nuitka, but so far nothing has come out of that.

I do however really want to change that. Let's see if it happens.

Twitter

I have started to use my Twitter account on occasions. You are welcome to follow me there. I will highlight interesting stuff there.

Future

So, there is multiple things going on:

  • Type Inference

    With SSA in place, Nuitka starts to recognize types, and treat things that work something assigned from {} or dict built-in with special nodes and code.

    That's going to be a lot of work. For float and list there are very important use cases, where the code can be much better. But dict is the hardest case, and to get the structure of shape tracing right, we are going there first.

  • Shape Analyisis

    The plan for types, is not to use them, but the more general shapes, things that will be more prevalent than actual type information in a program. In fact the precise knowledge will be rare, but more often, we will just have a set of operations performed on a variable, and be able to guess from there.

    Shape analysis will begin though with concrete types like dict. The reason is that some re-formulations like Python3 classes should not use locals, but dictionary accesses throughout for full compatibility. Tracing that correctly to be effectively the same code quality will allow to make that change.

  • Plug-ins

    Something I wish I could have shown at EuroPython was plug-ins to Nuitka. It has become more complete, and some demo plug-ins for say Qt plugins or multiprocessing, are starting to work, but it's not progressing recently. The API will need work and of course documentation. Hope is for this to expand Nuitka's reach and appeal to get more contributors.

    It would be sweet, if there were any takers, aiming to complete these things.

  • Nested frames

    One result of in-lining will be nested frames still present for exceptions to be properly annotated, or locals giving different sets of locals and so on.

    Some cleanup of these will be needed for code generation and SSA to be able to attach variables to some sort of container, and for a function to be able to reference different sets of these.

Let me know, if you are willing to help. I really need that help to make things happen faster. Nuitka will become more and more important only. And with your help, things will be there sooner.

Release Focus

One thing I have started recently, is to make changes to Nuitka focused to just one goal, and to only deal with the rare bug in other fields, but not much else at all. So instead of across the board improvements in just about everything, I have e.g. in the last release added type inference for dictionaries and special nodes and their code generation for dictionary operations.

This progresses Nuitka in one field. And the next release then e.g. will only focus on making the performance comparison tool, and not continue much in other fields.

That way, more "flow" is possible and more visible progress too. As an example of this, these are the focuses of last releases.

  • Full Python 3.5 on a clean base with generators redone so that coroutines fit in nicely.
  • Scalability of C compilation with argument parsing redone
  • Next release soon: Shape analysis of subscript usages and optimization to exact dictionaries
  • Next release thereafter: Comparison benchmarking (vmprof, resolving C level function identifiers easier)

Other focuses will also happen, but that's too far ahead. Mostly like some usability improvements will be the focus of a release some day. Focus is for things that are too complex to attack as a side project, and therefore never happen although surely possible.

Digging into Python3.5 coroutines and their semantics was hard enough, and the structual changes needed to integrate them properly with not too much special casing, but rather removing existing special cases (generator functions) was just too much work to ever happen while also doing other stuff.

Summary

So I am very excited about Nuitka. It feels like the puzzle is coming together finally, with type inference becoming a real thing. And should dictionaries be sorted out, the real important types, say float for scientific use cases, or int, list for others, will be easy to make.

With this, and then harder import association (knowing what other modules are), and module level SSA tracing that can be trusted, we can finally expect Nuitka to be generally fast and deserve to be called a compiler.

That will take a while, but it's likely to happen in 2016. Let's see if I will get the funding to go to EuroPython 2016, that would be great.