Nuitka Release 0.5.28

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release has a focus on compatibility work and contains bug fixes and work to enhance the usability of Nuitka by integrating with distutils. The major improvement is that contractions no longer use pseudo functions to achieve their own local scope, but that there is now a dedicated structure for that representing an in-lined function.

Bug Fixes

  • Python3.6: Fix, async for was not yet implemented for async generators.
  • Fix, functions with keyword arguments where the value was determined to be a static raise could crash the compiler.
  • Detect using MinGW64 32 bits C compiler being used with 64 bits Python with better error message.
  • Fix, when extracting side effects of a static raise, extract them more recursively to catch expressions that themselves have no code generation being used. This fixes at least static raises in keyword arguments of a function call.
  • Compatibility: Added support for proper operation of `pkgutil.get_data by implementing get_data in our meta path based loader.
  • Compatibility: Added __spec__ module attribute was previously missing, present on Python3.4 and higher.
  • Compatibility: Made __loader__ module attribute set when the module is loading already.
  • Standalone: Resolve the @rpath and @loader_path from otool on MacOS manually to actual paths, which adds support for libraries compiled with that.
  • Fix, nested functions calling super could crash the compiler.
  • Fix, could not use --recurse-directory with arguments that had a trailing slash.
  • Fix, using --recurse-directory on packages that are not in the search crashed the compiler.
  • Compatibility: Python2 set and dict contractions were using extra frames like Python3 does, but those are not needed.
  • Standalone: Fix, the way PYTHONHOME was set on Windows had no effect, which allowed the compiled binary to access the original installation still.
  • Standalone: Added some newly discovered missing hidden dependencies of extension modules.
  • Compatiblity: The name mangling of private names (e.g. __var) in classes was applied to variable names, and function declarations, but not to classes yet.
  • Python3.6: Fix, added support for list contractions with await expressions in async generators.
  • Python3.6: Fix, async for was not working in async generators yet.
  • Fix, for module tracebacks, we output the module name <module name> instead of merely <module>, but if the module was in a package, that was not indicated. Now it is <module package.name>.
  • Windows: The cache directory could be unicode which then failed to pass as an argument to scons. We now encode such names as UTF-8 and decode in Scons afterwards, solving the problem in a generic way.
  • Standalone: Need to recursively resolve shared libraries with ldd, otherwise not all could be included.
  • Standalone: Make sure sys.path has no references to CPython compile time paths, or else things may work on the compiling machine, but not on another.
  • Standalone: Added various missing dependencies.
  • Standalone: Wasn't considering the DLLs directory for standard library extensions for freezing, which would leave out these.
  • Compatibility: For __future__ imports the __import__ function was called more than once.

Optimization

  • Contractions are now all properly inlined and allow for optimization as if they were fully local. This should give better code in some cases.
  • Classes are now all building their locals dictionary inline to the using scope, allowing for more compact code.
  • The dictionary API was not used in module template code, although it helps to generate more compact code.

New Features

  • Experimental support for building platform dependent wheel distribution.

    python setup.py --command-packages=nuitka.distutils clean -a bdist_nuitka
    

    Use with caution, this is incomplete work.

  • Experimental support for running tests against compiled installation with nose and py.test.

  • When specifiying what to recurse to, now patterns can be used, e.g. like this --recurse-not-to=*.tests which will skip all tests in submodules from compilation.

  • By setting NUITKA_PACKAGE_packagename=/some/path the __path__ of packages can be extended automatically in order to allow and load uncompiled sources from another location. This can be e.g. a tests sub-package or other plug-ins.

  • By default when creating a module, now also a module.pyi file is created that contains all imported modules. This should be deployed alongside the extension module, so that standalone mode creation can benefit from knowing the dependencies of compiled code.

  • Added option --plugin-list that was mentioned in the help output, but still missing so far.

  • The import tracing of the hints module has achieved experimental status and can be used to test compatibility with regards to import behavior.

Cleanups

  • Rename tree and codegen Helper modules to unique names, making them easier to work with.
  • Share the code that decides to not warn for standard library paths with more warnings.
  • Use the bool enum definition of Python2 which is more elegant than ours.
  • Move quality tools, autoformat, isort, etc. to the nuitka.tools.quality namespace.
  • Move output comparison tool to the nuitka.tools.testing namespace.
  • Made frame code generation capable of using nested frames, allowing the real inline of classes and contraction bodies, instead of "direct" calls to pseudo functions being used.
  • Proper base classes for functions that are entry points, and functions that are merely a local expression using return statements.

Tests

  • The search mode with pattern, was not working anymore.
  • Resume hash values now consider the Python version too.
  • Added test that covers using test runners like nose and py.test with Nuitka compiled extension modules.

Organizational

  • Added support for Scons 3.0 and running Scons with Python3.5 or higher. The option to specifiy the Python to use for scons has been renamed to reflect that it may also be a Python3 now. Only for Python3.2 to Python3.4 we now need another Python installation.
  • Made recursion the default for --recurse-directory with packages. Before you also had to tell it to recurse into that package or else it would only include the top level package, but nothing below.
  • Updated the man pages, correct mentions of its C++ to C and don't use now deprecated options.
  • Updated the help output which still said that standalone mode implies recursion into standard library, which is no longer true and even not recommended.
  • Added option to disable the output of .pyi file when creating an extension module.
  • Removed Ubuntu Wily package download, no longer supported by Ubuntu.

Summary

This release was done to get the fixes and new features out for testing. There is work started that should make generators use an explicit extra stack via pointer, and restore instruction state via goto dispatchers at function entry, but that is not complete.

This feature, dubbed "goto generators" will remove the need for fibers (which is itself a lot of code), reduce the memory footprint at run time for anything that uses a lot of generators, or coroutines.

Integrating with distutils is also a new thing, and once completed will make use of Nuitka for existing projects automatic and trivial to do. There is a lot missing for that goal, but we will get there.

Also, documenting how to run tests against compiled code, if that test code lives inside of that package, will make a huge difference, as that will make it easier for people to torture Nuitka with their own test cases.

And then of course, nested frames now mean that every function could be inlined, which was previously not possible due to collisions of frames. This will pave the route for better optimization in those cases in future releases.

The experimental features will require more work, but should make it easier to use Nuitka for existing projects. Future releases will make integrating Nuitka dead simple, or that is the hope.

And last but not least, now that Scons works with Python3, chances are that Nuitka will more often work out the of the box. The older Python3 versions that still retain the issue are not very widespread.

Nuitka Release 0.5.27

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release comes a lot of bug fixes and improvements.

Bug Fixes

  • Fix, need to add recursed modules immediately to the working set, or else they might first be processed in second pass, where global names that are locally assigned, are optimized to the built-in names although that should not happen. Fixed in 0.5.26.1 already.
  • Fix, the accelerated call of methods could crash for some special types. This had been a regress of 0.5.25, but only happens with custom extension types. Fixed in 0.5.26.1 already.
  • Python3.5: For async def functions parameter variables could fail to properly work with in-place assignments to them. Fixed in 0.5.26.4 already.
  • Compatability: Decorators that overload type checks didn't pass the checks for compiled types. Now isinstance and as a result inspect module work fine for them.
  • Compatiblity: Fix, imports from __init__ were crashing the compiler. You are not supposed to do them, because they duplicate the package code, but they work.
  • Compatiblity: Fix, the super built-in on module level was crashing the compiler.
  • Standalone: For Linux, BSD and MacOS extension modules and shared libraries using their own $ORIGIN to find loaded DLLs resulted in those not being included in the distribution.
  • Standalone: Added more missing implicit dependencies.
  • Standalone: Fix, implicit imports now also can be optional, as e.g. _tkinter if not installed. Only include those if available.
  • The --recompile-c-only was only working with C compiler as a backend, but not in the C++ compatibility fallback, where files get renamed. This prevented that edit and test debug approach with at least MSVC.
  • Plugins: The PyLint plug-in didn't consider the symbolic name import-error but only the code F0401.
  • Implicit exception raises in conditional expressions would crash the compiler.

New Features

  • Added support for Visual Studio 2017. Issue#368.
  • Added option --python2-for-scons to specify the Python2 execute to use for calling Scons. This should allow using AnaConda Python for that task.

Optimization

  • References to known unassigned variables are now statically optimized to exception raises and warned about if the according option is enabled.
  • Unhashable keys in dictionaries are now statically optimized to exception raises and warned about if the according option is enabled.
  • Enable forward propagation for classes too, resulting in some classes to create only static dictionaries. Currently this never happens for Python3, but it will, once we can statically optimize __prepare__ too.
  • Enable inlining of class dictionary creations if they are mere return statements of the created dictionary. Currently this never happens for Python3, see above for why.
  • Python2: Selecting the metaclass is now visible in the tree and can be statically optimized.
  • For executables, we now also use a freelist for traceback objects, which also makes exception cases slightly faster.
  • Generator expressions no longer require the use of a function call with a .0 argument value to carry the iterator value, instead their creation is directly inlined.
  • Remove "pass through" frames for Python2 list contractions, they are no longer needed. Minimal gain for generated code, but more lightweight at compile time.
  • When compiling Windows x64 with MinGW64 a link library needs to be created for linking against the Python DLL. This one is now cached and re-used if already done.
  • Use common code for NameError and UnboundLocalError exception code raises. In some cases it was creating the full string at compile time, in others at run time. Since the later is more efficient in terms of code size, we now use that everywhere, saving a bit of binary size.
  • Make sure to release unused functions from a module. This saves memory and can be decided after a full pass.
  • Avoid using OrderedDict in a couple of places, where they are not needed, but can be replaced with a later sorting, e.g. temporary variables by name, to achieve deterministic output. This saves memory at compile time.
  • Add specialized return nodes for the most frequent constant values, which are None, True, and False. Also a general one, for constant value return, which avoids the constant references. This saves quite a bit of memory and makes traversal of the tree a lot faster, due to not having any child nodes for the new forms of return statements.
  • Previously the empty dictionary constant reference was specialized to save memory. Now we also specialize empty set, list, and tuple constants to the same end. Also the hack to make is not say that {} is {} was made more general, mutable constant references and now known to never alias.
  • The source references can be marked internal, which means that they should never be visible to the user, but that was tracked as a flag to each of the many source references attached to each node in the tree. Making a special class for internal references avoids storing this in the object, but instead it's now a class property.
  • The nodes for named variable reference, assignment, and deletion got split into separate nodes, one to be used before the actual variable can be determined during tree building, and one for use later on. This makes their API clearer and saves a tiny bit of memory at compile time.
  • Also eliminated target variable references, which were pseudo children of assignments and deletion nodes for variable names, that didn't really do much, but consume processing time and memory.
  • Added optimization for calls to staticmethod and classmethod built-in methods along with type shapes.
  • Added optimization for open built-in on Python3, also adding the type shape file for the result.
  • Added optimization for bytearray built-in and constant values. These mutable constants can now be compile time computed as well.
  • Added optimization for frozenset built-in and constant values. These mutable constants can now be compile time computed as well.
  • Added optimization for divmod built-in.
  • Treat all built-in constant types, e.g. type itself as a constant. So far we did this only for constant values types, but of course this applies to all types, giving slightly more compact code for their uses.
  • Detect static raises if iterating over non-iterables and warn about them if the option is enabled.
  • Split of locals node into different types, one which needs the updated value, and one which just makes a copy. Properly track if a functions needs an updated locals dict, and if it doesn't, don't use that. This gives more efficient code for Python2 classes, and exec using functions in Python2.
  • Build all constant values without use of the pickle module which has a lot more overhead than marshal, instead use that for too large long values, non-UTF8 unicode values, nan float, etc.
  • Detect the linker arch for all Linux platforms using objdump instead of only a hand few hard coded ones.

Cleanups

  • The use of INCREASE_REFCOUNT got fully eliminated.
  • Use functions not vulenerable for buffer overflow. This is generally good and avoids warnings given on OpenBSD during linking.
  • Variable closure for classes is different from all functions, don't handle the difference in the base class, but for class nodes only.
  • Make sure mayBeNone doesn't return None which means normally "unclear", but False instead, since it's always clear for those cases.
  • Comparison nodes were using the general comparison node as a base class, but now a proper base class was added instead, allowing for cleaner code.
  • Valgrind test runners got changed to using proper tool namespace for their code and share it.
  • Made construct case generation code common testing code for re-use in the speedcenter web site. The code also has minor beauty bugs which will then become fixable.
  • Use appdirs package to determine place to store the downloaded copy of depends.exe.
  • The code still mentioned C++ in a lot of places, in comments or identifiers, which might be confusing readers of the code.
  • Code objects now carry all information necessary for their creation, and no longer need to access their parent to determine flag values. That parent is subject to change in the future.
  • Our import sorting wrapper automatically detects imports that could be local and makes them so, removing a few existing ones and preventing further ones on the future.
  • Cleanups and annotations to become Python3 PyLint clean as well. This found e.g. that source code references only had __cmp__ and need rich comparison to be fully portable.

Tests

  • The test runner for construct tests got cleaned up and the constructs now avoid using xrange so as to not need conversion for Python3 execution as much.
  • The main test runner got cleaned up and uses common code making it more versatile and robust.
  • Do not run test in debugger if CPython also segfaulted executing the test, then it's not a Nuitka issue, so we can ignore that.
  • Improve the way the Python to test with is found in the main test runner, prefer the running interpreter, then PATH and registry on Windows, this will find the interesting version more often.
  • Added support for "Landscape.io" to ignore the inline copies of code, they are not under our control.
  • The test runner for Valgrind got merged with the usage for constructs and uses common code now.
  • Construct generation is now common code, intended for sharing it with the Speedcenter web site generation.
  • Rebased Python 3.6 test suite to 3.6.1 as that is the Python generally used now.

Organizational

  • Added inline copy of appdirs package from PyPI.
  • Added credits for RedBaron and isort.
  • The --experimental flag is now creating a list of indications and more than one can be used that way.
  • The PyLint runner can also work with Python3 pylint.
  • The Nuitka Speedcenter got more fine tuning and produces more tags to more easily identify trends in results. This needs to become more visible though.
  • The MSI files are also built on AppVeyor, where their building will not depend on me booting Windows. Getting these artifacts as downloads will be the next step.

Summary

This release improves many areas. The variable closure taking is now fully transparent due to different node types, the memory usage dropped again, a few obvious missing static optimizations were added, and many built-ins were completed.

This release again improves the scalability of Nuitka, which again uses less memory than before, although not an as big jump as before.

This does not extend or use special C code generation for bool or any type yet, which still needs design decisions to proceed and will come in a later release.

Nuitka Release 0.5.26

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release comes after a long time and contains large amounts of changes in all areas. The driving goal was to prepare generating C specific code, which is still not the case, but this is very likely going to change soon. However this release improves all aspects.

Bug Fixes

  • Compatibility: Fix, for star imports didn't check the values from the __all__ iterable, if they were string values which could cause problems at run time.

    # Module level
    __all__ = (1,)
    
    # ...
    # other module:
    from module import *
    
  • Fix, for star imports, also didn't check for values from __all__ if they actually exist in the original values.

  • Corner cases of imports should work a lot more precise, as the level of compatibility for calls to __import__ went from absurd to insane.

  • Windows: Fixed detection of uninstalled Python versions (not for all users and DLL is not in system directory). This of course only affected the accelerated mode, not standalone mode.

  • Windows: Scan directories for .pyd files for used DLLs as well. This should make the PyQt5 wheel work.

  • Python3.5: Fix, coroutines could have different code objects for the object and the frame using by it.

  • Fix, slices with built-in names crashed the compiler.

    something[id:len:range]
    
  • Fix, the C11 via C++ compatibility uses symlinks tp C++ filenames where possible instead of making a copy from the C source. However, even on Linux that may not be allowed, e.g. on a DOS file system. Added fallback to using full copy in that case. Issue#353.

  • Python3.5: Fix coroutines to close the "yield from" where an exception is thrown into them.

  • Python3: Fix, list contractions should have their own frame too.

  • Linux: Copy the "rpath" of compiling Python binary to the created binary. This will make compiled binaries using uninstalled Python versions transparently find the Python shared library.

  • Standalone: Add the "rpath" of the compiling Python binary to the search path when checking for DLL dependencies on Linux. This fixes standalone support for Travis and Anaconda on Linux.

  • Scons: When calling scons, also try to locate a Python2 binary to overcome a potential Python3 virtualenv in which Nuitka is running.

  • Standalone: Ignore more Windows only encodings on non-Windows.

New Features

  • Support for Python 3.6 with only few corner cases not supported yet.
  • Added options --python-arch to pick 32 or 64 bits Python target of the --python-version argument.
  • Added support for more kinds of virtualenv configurations.
  • Uninstalled Python versions such as Anaconda will work fine in accelerated mode, except on Windows.

Optimization

  • The node tree children are no longer stored in a separate dictionary, but in the instance dictionary as attributes, making the tree more lightweight and in principle faster to access. This also saved about 6% of the memory usage.

  • The memory usage of Nuitka for the Python part has fallen by roughly 40% due to the use of new style classes, and slots where that is possible (some classes use multiple inheritance, where they don't work), and generally by reducing useless members e.g. in source code references. This of course also will make things compiled faster (the C compilation of course is not affected by this.)

  • The code generation for frames was creating the dictionary for the raised exception by making a dictionary and then adding all variables, each tested to be set. This was a lot of code for each frame specific, and has been replaced by a generic "attach" mechanism which merely stores the values, and only takes a reference. When asked for frame locals, it only then builds the dictionary. So this is now only done, when that is absolutely necessary, which it normally never is. This of course makes the C code much less verbose, and actual handling of exceptions much more efficient.

  • For imports, we now detect for built-in modules, that their import cannot fail, and if name lookups can fail. This leads to less code generated for error handling of these. The following code now e.g. fully detects that no ImportError or AttributeError will occur.

    try:
        from __builtin__ import len
    except ImportError:
        from builtins import len
    
  • Added more type shapes for built-in type calls. These will improve type tracing.

  • Compiled frames now have a free list mechanism that should speed up frames that recurse and frames that exit with exceptions. In case of an exception, the frame ownership is immediately transferred to the exception making it easier to deal with.

  • The free list implementations have been merged into a new common one that can be used via macro expansion. It is now type agnostic and be slightly more efficient too.

  • Also optimize "true" division and "floor division", not only the default division of Python2.

  • Removed the need for statement context during code generation making it less memory intensive and faster.

Cleanups

  • Now always uses the __import__ built-in node for all kinds of imports and directly optimizes and recursion into other modules based on that kind of node, instead of a static variant. This removes duplication and some incompatability regarding defaults usage when doing the actual imports at run time.
  • Split the expression node bases and mixin classes to a dedicated module, moving methods that only belong to expressions outside of the node base, making for a cleaner class hierachy.
  • Cleaned up the class structure of nodes, added base classes for typical compositions, e.g. expression with and without children, computation based on built-in, etc. while also checking proper ordering of base classes in the metaclass.
  • Moved directory and file operations to dedicated module, making also sure it is more generally used. This makes it easier to make more error resilient deletions of directories on e.g. Windows, where locks tend to live for short times beyond program ends, requiring second attempts.
  • Code generation for existing supported types, PyObject *, PyObject **, and struct Nuitka_CellObject * is now done via a C type class hierachy instead of elif sequences.
  • Closure taking is now always done immediately correctly and references are take for closure variables still needed, making sure the tree is correct and needs no finalization.
  • When doing variable traces, initialize more traces immediately so it can be more reliable.
  • Code to setup a function for local variables and clean it up has been made common code instead of many similar copies.
  • The code was treating the f_executing frame member as if it were a counter with increases and decreases. Turn it into a mere boolean value and hide its usage behind helper functions.
  • The "maybe local variables" are no more. They were replaced by a new locals dict access node with a fallback to a module or closure variable should the dictionary not contain the name. This avoids many ugly checks to not do certain things for that kind of variable.
  • We now detect "exec" and "unqualified exec" as well as "star import" ahead of time as flags of the function to be created. We no longer need to mark functions as we go.
  • Handle "true", "floor" and normal division properly by applying future flags to decide which one to use.
  • We now use symbolic identifiers in all PyLint annotations.
  • The release scripts started to move into nuitka.tools.release so they get PyLint checks, autoformat and proper code re-use.
  • The use of INCREASE_REFCOUNT_X was removed, it got replaced with proper Py_XINCREF usages.
  • The use of INCREASE_REFCOUNT got reduced further, e.g. no generated code uses it anymore, and only a few compiled types do. The function was once required before "C-ish" lifted the need to do everything in one single function call.

Tests

  • More robust deletion of directories, temporary stages used by CPython test suites, and standalone directories during test execution.
  • Moved tests common code into nuitka.tools.testing namespace and use it from there. The code now is allowed to use nuitka.utils and therefore often better implementations.
  • Made standalone binaries robust against GTK theme access, checking the Python binary (some site.py files do that),

Organizational

  • Added repository for Ubuntu Zesty (17.04) for download.
  • Added support for testing with Travis to complement the internal Buildbot based infrastructure and have pull requests on Github automatically tested before merge.
  • The factory branch is now also on Github.
  • Removed MSI for Python3.4 32 bits. It seems impossible to co-install this one with the 64 bits variant. All other versions are provided for both bit sizes still.

Summary

This release marks huge progress. The node tree is now absolutely clean, the variable closure taking is fully represented, and code generation is prepared to add another type, e.g. for bool for which work has already started.

On a practical level, the scalability of the release will have increased very much, as this uses so much less memory, generates simpler C code, while at the same time getting faster for the exception cases.

Coming releases will expand on the work of this release.

Frame objects should be allowed to be nested inside a function for better re-formulations of classes and contractions of all kinds, as well as real inline of functions, even if they could raise.

The memory savings could be even larger, if we stopped doing multiple inheritance for more node types. The __slots__ were and the child API change could potentially make things not only more compact, but faster to use too.

And also once special C code generation for bool is done, it will set the stage for more types to follow (int, float, etc). Only this will finally start to give the C type speed we are looking for.

Until then, this release marks a huge cleanup and progress to what we already had, as well as preparing the big jump in speed.

Nuitka Release 0.5.25

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains a huge amount of bug fixes, lots of optimization gains, and many new features. It also presents many organizational improvements, and many cleanups.

Bug Fixes

  • Python3.5: Coroutine methods using super were crashing the compiler. Issue#340. Fixed in 0.5.24.2 already.

  • Python3.3: Generator return values were not properly transmitted in case of tuple or StopIteration values.

  • Python3.5: Better interoperability between compiled coroutines and uncompiled generator coroutines.

  • Python3.5: Added support to compile in Python debug mode under Windows too.

  • Generators with arguments were using two code objects, one with, and one without the CO_NOFREE flag, one for the generator object creating function, and one for the generator object.

  • Python3.5: The duplicate code objects for generators with arguments lead to interoperability issues with between such compiled generator coroutines and compiled coroutines. Issue#341. Fixed in 0.5.24.2 already.

  • Standalone: On some Linux variants, e.g. Debian Stretch and Gentoo, the linker needs more flags to really compile to a binary with RPATH.

  • Compatibility: For set literal values, insertion order is wrong on some versions of Python, we now detect the bug and emulate it if necessary, previous Nuitka was always correct, but incompatible.

    {1, 1.0}.pop() # the only element of the set should be 1
    
  • Windows: Make the batch files detect where they live at run time, instead of during setup.py, making it possible to use them for all cases.

  • Standalone: Added package paths to DLL scan for depends.exe, as with wheels there now sometimes live important DLLs too.

  • Fix, the clang mode was regressed and didn't work anymore, breaking the MacOS support entirely.

  • Compatibility: For imports, we were passing for locals argument a real dictionary with actual values. That is not what CPython does, so stopped doing it.

  • Fix, for raised exceptions not passing the validity tests, they could be used after free, causing crashes.

  • Fix, the environment CC wasn't working unless also specifying CXX.

  • Windows: The value of __file__ in module mode was wrong, and didn't point to the compiled module.

  • Windows: Better support for --python-debug for installations that have both variants, it is now possible to switch to the right variant.

New Features

  • Added parsing for shebang to Nuitka. When compiling an executable, now Nuitka will check of the #! portion indicates a different Python version and ask the user to clarify with --python-version in case of a mismatch.
  • Added support for Python flag -O, which allows to disable assertions and remove doc strings.

Optimization

  • Faster method calls, combining attribute lookup and method call into one, where order of evaluation with arguments doesn't matter. This gives really huge relative speedups for method calls with no arguments.
  • Faster attribute lookup in general for object descendants, which is all new style classes, and all built-in types.
  • Added dedicated xrange built-in implementation for Python2 and range for Python3. This makes those faster while also solving ordering problems when creating constants of these types.
  • Faster sum again, using quick iteration interface and specialized quick iteration code for typical standard type containers, tuple and list.
  • Compiled generators were making sure StopIteration was set after their iteration, although most users were only going to clear it. Now only the send method, which really needs that does it. This speed up the closing of generators quite a bit.
  • Compiled generators were preparing a throw into non-started compilers, to be checked for immediately after their start. This is now handled in a generic way for all generators, saving code and execution time in the normal case.
  • Compiled generators were applying checks only useful for manual send calls even during iteration, slowing them down.
  • Compiled generators could duplicate code objects due to handling a flag for closure variables differently.
  • For compiled frames, the f_trace is not writable, but was taking and releasing references to what must be None, which is not useful.
  • Not passing locals to import calls make it less code and faster too.

Organizational

  • This release also prepares Python 3.6 support, it includes full language support on the level of CPython 3.6.0 with the sole exception of the new generator coroutines.
  • The improved mode is now the default, and full compatibility is now the option, used by test suites. For syntax errors, improved mode is always used, and for test suites, now only the error message is compared, but not call stack or caret positioning anymore.
  • Removed long deprecated option "--no-optimization". Code generation too frequently depends on not seeing unoptimized code. This has been hidden and broken long enough to finally remove it.
  • Added support for Python3.5 numbers to Speedcenter. There are now also tags for speedcenter, indicating how well "develop" branch fares in comparison to master.
  • With a new tool, source code and developer manual contents can be kept in sync, so that descriptions can be quoted there. Eventually a full Sphinx documentation might become available, but for now this makes it workable.
  • Added repository for Ubuntu Yakkety (16.10) for download.
  • Added repository for Fedora 25 for download.

Cleanups

  • Moved the tools to compare CPython output, to sort import statements (isort) to autoformat the source code (Redbaron usage), and to check with PyLint to a common new nuitka.tools package, runnable with __main__ modules and dedicated runners in bin directory.
  • The tools now share code to find source files, or have it for the first time, and other things, e.g. finding needed binaries on Windows installations.
  • No longer patch traceback objects dealloc function. Should not be needed anymore, and most probably was only bug hiding.
  • Moved handling of ast nodes related to import handling to the proper reformulation module.
  • Moved statement generation code to helpers module, making it accessible without cyclic dependencies that require local imports.
  • Removed deprecated method for getting constant code objects in favor of the new way of doing it. Both methods were still used, making it harder to analyse.
  • Removed useless temporary variable initializations from complex call helper internal functions. They worked around code generation issues that have long been solved.
  • The ABI flags are no longer passed to Scons together with the version.

Tests

  • Windows: Added support to detect and to switch debug Python where available to also be able to execute reference counting tests.
  • Added the CPython 3.3 test suite, after cleaning up the worst bits of it, and added the brandnew 3.6 test suite with a minimal set of changes.
  • Use the original 3.4 test suite instead of the one that comes from Debian as it has patched quite a few issues that never made it upstream, and might cause crashes.
  • More construct tests, making a difference between old style classes, which have instances and new style classes, with their objects.
  • It is now possible to run a test program with Python3 and Valgrind.

Summary

The quick iteration is a precursor to generally faster iteration over unknown object iterables. Expanding this to general code generation, and not just the sum built-in, might yield significant gains for normal code in the future, once we do code generation based on type inference.

The faster method calls complete work that was already prepared in this domain and also will be expanded to more types than compiled functions. More work will be needed to round this up.

Adding support for 3.6.0 in the early stages of its release, made sure we pretty much have support for it ready right after release. This is always a huge amount of work, and it's good to catch up.

This release is again a significant improvement in performance, and is very important to clean up open ends. Now the focus of coming releases will now be on both structural optimization, e.g. taking advantage of the iterator tracing, and specialized code generation, e.g. for those iterations really necessary to use quick iteration code.

Nuitka Release 0.5.24

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is again focusing on optimization, this time very heavily on the generator performance, which was found to be much slower than CPython for some cases. Also there is the usual compatibility work and improvements for Pure C support.

Bug Fixes

  • Windows: The 3.5.2 coroutine new protocol implementation was using the wrapper from CPython, but it's not part of the ABI on Windows. Have our own instead. Fixed in 0.5.23.1 already.
  • Windows: Fixed second compilation with MSVC failing. The files renamed to be C++ files already existed, crashing the compilation. Fixed in 0.5.23.1 already.
  • Mac OS: Fixed creating extension modules with .so suffix. This is now properly determined by looking at the importer details, leading to correct suffix on all platforms. Fixed in 0.5.23.1 already.
  • Debian: Don't depend on a C++ compiler primarily anymore, the C compiler from GNU or clang will do too. Fixed in 0.5.23.1 already.
  • Pure C: Adapted scons compiler detecting to properly consider C11 compilers from the environment, and more gracefully report things.

Optimization

  • Python2: Generators were saving and restoring exceptions, updating the variables sys.exc_type for every context switch, making it really slow, as these are 3 dictionary updates, normally not needed. Now it's only doing it if it means a change.
  • Sped up creating generators, coroutines and coroutines by attaching the closure variable storage directly to the object, using one variable size allocation, instead of two, once of which was a standard malloc. This makes creating them easier and avoids maintaining the closure pointer entirely.
  • Using dedicated compiled cell implementation similar to PyCellObject but fully under our control. This allowed for smaller code generated, while still giving a slight performance improvement.
  • Added free list implementation to cache generator, coroutines, and function objects, avoiding the need to create and delete this kind of objects in a loop.
  • Added support for the built-in sum, making slight optimizations to be much faster when iterating over lists and tuples, as well as fast long sum for Python2, and much faster bool sums too. This is using a prototype version of a "qiter" concept.
  • Provide type shape for xrange calls that are not constant too, allowing for better optimization related to those.

Tests

  • Added workarounds for locks being held by Virus Scanners on Windows to our test runner.
  • Enhanced constructs that test generator expressions to more clearly show the actual construct cost.
  • Added construct tests for the sum built-in on verious types of int containers, making sure we can do all of those really fast.

Summary

This release improves very heavily on generators in Nuitka. The memory allocator is used more cleverly, and free lists all around save a lot of interactions with it. More work lies ahead in this field, as these are not yet as fast as they should be. However, at least Nuitka should be faster than CPython for these kind of usages now.

Also, proper pure C in the Scons is relatively important to cover more of the rarer use cases, where the C compiler is too old.

The most important part is actually how sum optimization is staging a new kind of approach for code generation. This could become the standard code for iterators in loops eventually, making for loops even faster. This will be for future releases to expand.

Nuitka Release 0.5.23

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is focusing on optimization, the most significant part for the users being enhanced scalability due to memory usage, but also break through structural improvements for static analysis of iterators and the debut of type shapes and value shapes, giving way to "shape tracing".

Bug Fixes

  • Fix support Python 3.5.2 coroutine changes. The checks got added for improved mode for older 3.5.x, the new protocol is only supported when run with that version or higher.

  • Fix, was falsely optimizing away unused iterations for non-iterable compile time constants.

    iter(1) # needs to raise.
    
  • Python3: Fix, eval must not attempt to strip memoryviews. The was preventing it from working with that type.

  • Fix, calling type without any arguments was crashing the compiler. Also the exception raised for anything but 1 or 3 arguments was claiming that only 3 arguments were allowed, which is not the compatible thing.

  • Python3.5: Fix, follow enhanced error checking for complex call handling of star arguments.

  • Compatibility: The from x import x, y re-formulation was doing two __import__ calls instead of re-using the module value.

Optimization

  • Uses only about 66% of the memory compared to last release, which is very important step for scalability independent of re-loading. This was achieved by making sure to break loop traces and their reference cycle when they become unused.

  • Properly detect the len of multiplications at compile time from newly introduces value shapes, so that this is e.g. statically optimized.

    print(len("*" * 10000000000))
    
  • Due to newly introduced type shapes, len and iter now properly detect more often if values will raise or not, and warn about detected raises.

    iter(len((something)) # Will always raise
    
  • Due to newly introduced "iterator tracing", we can now properly detect if the length of an unpacking matches its source or not. This allows to remove the check of the generic re-formulations of unpackings at compile time.

    a, b = b, a    # Will never raise due to unpacking
    a, b = b, a, c # Will always raise, 3 items cannot unpack to 2
    
  • Added support for optimization of the xrange built-in for Python2.

  • Python2: Added support for xrange iterable constant values, pre-building those constants ahead of time.

  • Python3: Added support and range iterable constant values, pre-building those constants ahead of time. This brings optimization support for Python3 ranges to what was available for Python2 already.

  • Avoid having a special node variange for range with no arguments, but create the exception raising node directly.

  • Specialized constant value nodes are using less generic implementations to query e.g. their length or iteration capabilities, which should speed up many checks on them.

  • Added support for the format built-in.

  • Python3: Added support for the ascii built-in.

Organizational

  • The movement to pure C got the final big push. All C++ only idoms of C++ were removed, and everything works with C11 compilers. A C++03 compiler can be used as a fallback, in case of MSVC or too old gcc for instance.
  • Using pure C, MinGW64 6x is now working properly. The latest version had problems with hypot related changes in the C++ standard library. Using C11 solves that.
  • This release also prepares Python 3.6 support, it includes full language support on the level of CPython 3.6.0b1.
  • The CPython 3.6 test suite was run with Python 3.5 to ensure bug level compatibility, and had a few findings of incompatibilities.

Cleanups

  • The last holdouts of classes in Nuitka were removed, and many idioms of C++ were stopped using.
  • Moved range related helper functions to a dedicated include file.
  • Using str is not bytes to detect Python3 str handling or actual bytes type existence.
  • Trace collections were using a mix-in that was merged with the base class that every user of it was having.

Tests

  • Added more static optimization tests, a lot more has become feasible to decide at run time, and is now done. These are to detect regressions in that domain.
  • The CPython 3.6 test suite is now also run with CPython 3.5 which found some incompatibilities.

Summary

This release marks a huge step forward. We are having the structure for type inference now. This will expand in coming releases to cover more cases, and there are many low hanging fruits for optimization. Specialized codes for variable versions of certain known shapes seems feasible now.

Then there is also the move towards pure C. This will make the backend compilation lighter, but due to using C11, we will not suffer any loss of convinience compared to "C-ish". The plan is to use continue to use C++ for compilation for compilers not capable of supporting C11.

The amount of static analysis done in Nuitka is now going to quickly expand, with more and more constructs predicted to raise errors or simplified. This will be an ongoing activity, as many types of expressions need to be enhanced, and only one missing will not let it optimize as well.

Also, it seems about time to add dedicated code for specific types to be as fast as C code. This opens up vast possibilities for acceleration and will lead us to zero overhead C bindings eventually. But initially the drive is towards enhanced import analysis, to become able to know the precide module expected to be imported, and derive type information from this.

The coming work will attack to start whole program optimization, as well as enhanced local value shape analysis, as well specialized type code generation, which will make Nuitka improve speed.

Nuitka Release 0.5.22

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is mostly an intermediate release on the way to the large goal of having per module compilation that is cachable and requires far less memory for large programs. This is currently in progress, but required many changes that are in this release, more will be needed.

It also contains a bunch of bug fixes and enhancements that are worth to be released, and the next changes are going to be more invasive.

Bug Fixes

  • Compatibility: Classes with decorated __new__ functions could miss out on the staticmethod decorator that is implicit. It's now applied always, unless of course it's already done manually. This corrects an issue found with Pandas. Fixed in 0.5.22.1 already.
  • Standalone: For at least Python 3.4 or higher, it could happen that the locale needed was not importable. Fixed in 0.5.22.1 already.
  • Compatibility: Do not falsely assume that not expressions cannot raise on boolean expressions, since those arguments might raise during creation. This could lead to wrong optimization. Fixed in 0.5.22.2 already.
  • Standalone: Do not include system specific C libraries in the distribution created. This would lead to problems for some configurations on Linux in cases the glibc is no longer compatible with newer oder older kernels. Fixed in 0.5.22.2 already.
  • The --recurse-directory option didn't check with decision mechanisms for module inclusion, making it impossible to avoid some things.

Optimization

  • Introduced specialized constant classes for empty dictionaries and other special constants, e.g. "True" and "False", so that they can have more hard coded properties and save memory by sharing constant values.
  • The "technical" sharing of a variable is only consider for variables that had some sharing going in the first place, speeing things up quite a bit for that still critical check.
  • Memory savings coming from enhanced trace storage are already visible at about 1%. That is not as much as the reloading will mean, but still helpful to use less overall.

Cleanups

  • The global variable registry was removed. It was in the way of unloading and reloading modules easily. Instead variables are now attached to their owner and referenced by other users. When they are released, these variables are released.
  • Global variable traces were removed. Instead each variable has a list of the traces attached to it. For non-shared variables, this allows to sooner tell attributes of those variables, allowing for sooner optimization of them.
  • No longer trace all initial users of a variable, just merely if there were such and if it constitutes sharing syntactically too. Not only does this save memory, it avoids useless references of the variable to functions that stop using it due to optimization.
  • Create constant nodes via a factory function to avoid non-special instances where variants exist that would be faster to use.
  • Moved the C string functions to a proper nuitka.utils.CStrings package as we use it for better code names of functions and modules.
  • Made functions and explicit child node of modules, which makes their use more generic, esp. for re-loading modules.
  • Have a dedicated function for building frame nodes, making it easier to see where they are created.

Summary

This release is the result of a couple of months work, and somwhat means that proper re-loading of cached results is becoming in sight. The reloading of modules still fails for some things, and more changes will be needed, but with that out of the way, Nuitka's footprint is about to drop and making it then absolutely scalable. Something considered very important before starting to trace more information about values.

This next thing big ought to be one thing that structurally holds Nuitka back from generating C level performance code with say integer operations.

Nuitka Release 0.5.21

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release focused on scalability work. Making Nuitka more usable in the common case, and covering more standalone use cases.

Bug Fixes

  • Windows: Support for newer MinGW64 was broken by a workaround for older MinGW64 versions.
  • Compatibility: Added support for the (inofficial) C-Python API Py_GetArgcArgv that was causing prctl module to fail loading on ARM platforms.
  • Compatibility: The proper error message template for complex call arguments is now detected as compile time. There are changes comming, that are already in some pre-releases of CPython.
  • Standalone: Wasn't properly ignoring Tools and other directories in the standard library.

New Features

  • Windows: Detect the MinGW compiler arch and compare it to the Python arch. In case of a mismatch, the compiler is not used. Otherwise compilation or linking gives hard to understand errors. This also rules out MinGW32 as a compiler that can be used, as its arch doesn't match MinGW64 32 bits variant.
  • Compile modules in two passes with the option to specify which modules will be considered for a second pass at all (compiled without program optimization) or even become bytecode.
  • The developer mode installation of Nuitka in develop mode with the command pip install -e nuitka_git_checkout_dir is now supported too.

Optimization

  • Popular modules known to not be performance relevant are no longer C compiled, e.g. numpy.distutils and many others frequently imported (from some other module), but mostly not used and definitely not performance relevant.

Cleanups

  • The progress tracing and the memory tracing and now more clearly separate and therefore more readable.
  • Moved RPM related files to new rpm directory.
  • Moved documentation related files to doc directory.
  • Converted import sorting helper script to Python and made it run fast.

Organizational

  • The Buildbot infrastructure for Nuitka was updated to Buildbot 0.8.12 and is now maintained up to date with Ansible.
  • Upgraded the Nuitka bug tracker to Roundup 1.5.1 to which I had previously contributed security fixes already active.
  • Added SSL certificates from Let's Encrypt for the web server.

Summary

This release advances the scalability of Nuitka somewhat. The two pass approach does not yet carry all possible fruits. Caching of single pass compiled modules should follow for it to become consistently fast.

More work will be needed to achieve fast and scalable compilation, and that is going to remain the focus for some time.

Nuitka Release 0.5.20

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is mostly about catching up with issues. Most address standalone problems with special modules, but there are also some general compatibility corrections, as well as important fixes for Python3.5 and coroutines and to improve compatibility with special Python variants like AnaConda under the Windows system.

Bug Fixes

  • Standalone Python3.5: The _decimal module at least is using a __name__ that doesn't match the name at load time, causing programs that use it to crash.
  • Compatibility: For Python3.3 the __loader__ attribute is now set in all cases, and it needs to have a __module__ attribute. This makes inspection as done by e.g. flask working.
  • Standalone: Added missing hidden dependencies for Tkinter module, adding support for this to work properly.
  • Windows: Detecting the Python DLL and EXE used at compile time and preserving this information use during backend compilation. This should make sure we use the proper ones, and avoids hacks for specific Python variants, enhancing the support for AnaConda, WinPython, and CPython installations.
  • Windows: The --python-debug flag now properly detects if the run time is supporting things and error exits if it's not available. For a CPython3.5 installation, it will switch between debug and non-debug Python binaries and DLLs.
  • Standalone: Added plug-in for the Pwm package to properly combine it into a single file, suitable for distribution.
  • Standalone: Packages from standard library, e.g. xml now have proper __path__ as a list and not as a string value, which breaks code of e.g. PyXML. Issue#183.
  • Standalone: Added missing dependency of twisted.protocols.tls. Issue#288.
  • Python3.5: When finalizing coroutines that were not finished, a corruption of its reference count could happen under some circumstances.
  • Standalone: Added missing DLL dependency of the uuid module at run time, which uses ctypes to load it.

New Features

  • Added support for AnaConda Python on this Linux. Both accelerated and standalone mode work now. Issue#295.
  • Added support for standalone mode on FreeBSD. Issue#294.
  • The plug-in framework was expanded with new features to allow addressing some specific issues.

Cleanups

  • Moved memory related stuff to dedicated utils package nuitka.utils.MemoryUsage as part of an effort to have more topical modules.
  • Plug-ins how have a dedicated module through which the core accesses the API, which was partially cleaned up.
  • No more "early" and "late" import detections for standalone mode. We now scan everything at the start.

Summary

This release focused on expanding plugins. These were then used to enhance the success of standalone compatibility. Eventually this should lead to a finished and documented plug-in API, which will open up the Nuitka core to easier hacks and more user contribution for these topics.

Nuitka Release 0.5.19

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release brings optimization improvements for dictionary using code. This is now lowering subscripts to dictionary accesses where possible and adds new code generation for known dictionary values. Besides this there is the usual range of bug fixes.

Bug Fixes

  • Fix, attribute assignments or deletions where the assigned value or the attribute source was statically raising crashed the compiler.
  • Fix, the order of evaluation during optimization was considered in the wrong order for attribute assignments source and value.
  • Windows: Fix, when g++ is the path, it was not used automatically, but now it is.
  • Windows: Detect the 32 bits variant of MinGW64 too.
  • Python3.4: The finalize of compiled generators could corrupt reference counts for shared generator objects. Fixed in 0.5.18.1 already.
  • Python3.5: The finalize of compiled coroutines could corrupt reference counts for shared generator objects.

Optimization

  • When a variable is known to have dictionary shape (assigned from a constant value, result of dict built-in, or a general dictionary creation), or the branch merge thereof, we lower subscripts from expecting mapping nodes to dictionary specific nodes. These generate more efficient code, and some are then known to not raise an exception.

    def someFunction(a,b):
        value = {a : b}
        value["c"] = 1
        return value
    

    The above function is not yet fully optimized (dictionary key/value tracing is not yet finished), however it at least knows that no exception can raise from assigning value["c"] anymore and creates more efficient code for the typical result = {} functions.

  • The use of "logical" sharing during optimization has been replaced with checks for actual sharing. So closure variables that were written to in dead code no longer inhibit optimization of the then no more shared local variable.

  • Global variable traces are now faster to decide definite writes without need to check traces for this each time.

Cleanups

  • No more using "logical sharing" allowed to remove that function entirely.
  • Using "technical sharing" less often for decisions during optimization and instead rely more often on proper variable registry.
  • Connected variables with their global variable trace statically avoid the need to check in variable registry for it.
  • Removed old and mostly unused "assume unclear locals" indications, we use global variable traces for this now.

Summary

This release aimed at dictionary tracing. As a first step, the value assign is now traced to have a dictionary shape, and this this then used to lower the operations which used to be normal subscript operations to mapping, but now can be more specific.

Making use of the dictionary values knowledge, tracing keys and values is not yet inside the scope, but expected to follow. We got the first signs of type inference here, but to really take advantage, more specific shape tracing will be needed.