Nuitka Release 0.5.32

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains substantial new optimization, bug fixes, and already the full support for Python 3.7. Among the fixes, the enhanced coroutine work for compatiblity with uncompiled ones is most important.

Bug Fixes

  • Fix, was optimizing write backs of attribute in-place assignments falsely.
  • Fix, generator stop future was not properly supported. It is now the default for Python 3.7 which showed some of the flaws.
  • Python3.5: The __qualname__ of coroutines and asyncgen was wrong.
  • Python3.5: Fix, for dictionary unpackings to calls, check the keys if they are string values, and raise an exception if not.
  • Python3.6: Fix, need to check assignment unpacking for too short sequences, we were giving IndexError instead of ValueError for these. Also the error messages need to consider if they should refer to "at least" in their wording.
  • Fix, outline nodes were cloned more than necessary, which would corrupt the code generation if they later got removed, leading to a crash.
  • Python3.5: Compiled coroutines awaiting uncompiled coroutines was not working properly for finishing the uncompiled ones. Also the other way around was raising a RuntimeError when trying to pass an exception to them when they were already finished. This should resolve issues with asyncio module.
  • Fix, side effects of a detected exception raise, when they had an exception detected inside of them, lead to an infinite loop in optimization. They are now optimized in-place, avoiding an extra step later on.

New Features

  • Support for Python 3.7 with only some corner cases not supported yet.

Optimization

  • Delay creation of StopIteration exception in generator code for as long as possible. This gives more compact code for generations, which now pass the return values via compiled generator attribute for Python 3.3 or higher.
  • Python3: More immediate re-formulation of classes with no bases. Avoids noise during optimization.
  • Python2: For class dictionaries that are only assigned from values without side effects, they are not converted to temporary variable usages, allowing the normal SSA based optimization to work on them. This leads to constant values for class dictionaries of simple classes.
  • Explicit cleanup of nodes, variables, and local scopes that become unused, has been added, allowing for breaking of cyclic dependencies that prevented memory release.

Tests

  • Adapted 3.5 tests to work with 3.7 coroutine changes.
  • Added CPython 3.7 test suite.

Cleanups

  • Removed remaining code that was there for 3.2 support. All uses of version comparisons with 3.2 have been adapted. For us, Python3 now means 3.3, and we will not work with 3.2 at all. This removed a fair bit of complexity for some things, but not all that much.
  • Have dedicated file for import released helpers, so they are easier to find if necessary. Also do not have code for importing a name in the header file anymore, not performance relevant.
  • Disable Python warnings when running scons. These are particularily given when using a Python debug binary, which is happening when Nuitka is run with --python-debug option and the inline copy of Scons is used.
  • Have a factory function for all conditional statement nodes created. This solved a TODO and handles the creation of statement sequences for the branches as necessary.
  • Split class reformulation into two files, one for Python2 and one for Python3 variant. They share no code really, and are too confusing in a single file, for the huge code bodies.
  • Locals scopes now have a registry, where functions and classes register their locals type, and then it is created from that.
  • Have a dedicated helper function for single argument calls in static code that does not require an array of objects as an argument.

Organizational

  • There are now requirements-devel.txt and requirements.txt files aimed at usage with scons and by users, but they are not used in installation.

Summary

This releases has this important step to add conversion of locals dictionary usages to temporary variables. It is not yet done everywhere it is possible, and the resulting temporary variables are not yet propagated in the all the cases, where it clearly is possible. Upcoming releases ought to achieve that most Python2 classes will become to use a direct dictionary creation.

Adding support for Python 3.7 is of course also a huge step. And also this happened fairly quickly and soon after its release. The generic classes it adds were the only real major new feature. It breaking the internals for exception handling was what was holding back initially, but past that, it was really easy.

Expect more optimization to come in the next releases, aiming at both the ability to predict Python3 metaclasses __prepare__ results, and at more optimization applied to variables after they became temporary variables.

Nuitka this week #1

New Series Rationale

I think I tend to prefer coding over communication too much. I think I need to make more transparent what I am doing. Also things, will be getting exciting continuously for a while now.

I used to status report posts, many years ago, every 3 months or so, and that was nice for me also to get an idea of what changed, but I stopped. What did not happen, was to successfully engage other people to contribute.

This time I am getting more intense. I will aim to do roughly weekly or bi-weekly reports, where I highlight things that are going on, newly found issues, hotfixes, all the things Nuitka.

Planned Mode

I will do it this fashion. I will write a post to the mailing list, right about wednesday every week or so. I need to pick a day. I am working from home that day, saving me commute time. I will invest that time into this.

The writing will not be too high quality at times. Bare with me there. Then I will check feedback from the list, if any. Hope is for it to point out the things where I am not correct, missing, or even engage right away.

Topics are going to be random, albeit repeating. I will try and make links to previous issues where applicable. Therefore also the TOC, which makes for link targets in the pages.

Locals Dict

When I am speaking of locals dict, I am talking of class scopes (and functions with exec statements). These started to use actual dictionary a while ago, which was a severe setback to optimization.

Right now, so for this week, after a first prototype was making the replacement of local dict assignment and references for Python2, and kind of worked through my buildbots, flawlessly, I immediately noticed that it would require some refactoring to not depend on the locals scopes to be only in one of the trace collections. Thinking of future inlining, maybe part of a locals scope was going to be in multiple functions, that ought to not be affected.

Therefore I made a global registry of locals scopes, and working on those, I checked its variables, if they can be forward propagated, and do this not per module, but after all the modules have been done. This is kind of a setback for the idea of module specific optimization (cachable later on) vs. program optimization, but since that is not yet a thing, it can remain this way for now.

Once I did that, I was interested to see the effect, but to my horror, I noticed, that memory was not released for the locals dict nodes. It was way too involved with cyclic dependencies, which are bad. So that was problematic of course. Compilation to keep nodes in memory for both tracing the usage as a locals dict and temporary variables, wasn't going to help scaling at all.

Solution is finalization

Nodes need Finalization

So replaced nodes reference a parent, and then the locals scope references variables, and trace collections referencing variables, which reference locals scopes, and accesses referencing traces, and so on. The garbage collector can handle some of this, but seems I was getting past that.

For a solution, I started to add a finalize method, which released the links for locals scopes, when they are fully propagated, on the next run.

Adding a finalize to all nodes, ought to make sure, memory is released soon, and might even find bugs, as nodes become unusable after they are supposedly unused. Obviously, there will currently be cases, where nodes becomes unused, but they are not finalized yet. Also, often this is more manual, because part of the node is to be released, but one child is re-used. That is messy.

Impact on Memory Usage

The result was a bit disappointing. Yes, memory usage of mercurial compilation went back again, but mostly to what it had been. Some classes are now having their locals dict forward propagated, but the effect is not always a single dictionary making yet. Right now, function definitions, are not forward at all propagated. This is a task I want to take on before next release though, but maybe not, there is other things too. But I am assuming that will make most class dictionaries created without using any variables at all anymore, which should make it really lean.

Type Hints Question

Then, asking about type hints, I got the usual question about Nuitka going to use it. And my stance is unchanged. They are just hints, not reliable. Need to behave the same if users do it wrong. Suggested to create decorated which make type hints enforced. But I expect nobody takes this on though. I need to make it a Github issue of Nuitka, although technically it is pure CPython work and ought to be done independently. Right now Nuitka is not yet there anyway yet, to take full advantage.

Python 3.7

Then, for Python 3.7, I have long gotten the 3.6 test suite to pass. I raised 2 bugs with CPython, one of which lead to update of a failing test. Nuitka had with large delay, caught of with what del __annotations__ was doing in a class. Only with the recent work for proper locals dict code generation, we could enforce a name to be local, and have proper code generation, that allows for it to be unset.

This was of course a bit of work. But the optimization behind was always kind of necessary to get right. But now, that I got this, think of my amazement when for 3.7 they reverted to the old behavior, where annotiatons then corrupt the module annotations

The other bug is a reference counting bug, where Nuitka tests were failing with CPython 3.7, and turns out, there is a bug in the dictionary implementation of 3.7, but it only corrupts counts reported, not actual objects, so it's harmless, but means for 3.7.0 the reference count tests are disabled.

Working through the 3.7 suite, I am cherry picking commits, that e.g. allow the repr of compiled functions to contain <compiled_function ...> and the like. Nothing huge yet. There is now a subscript of type, and foremost the async syntax became way more liberal, so it is more complex for Nuitka to make out if it is a coroutine due to something happening inside a generator declared inside of it. Also cr_origin was added to coroutines, but that is mostly it.

Coroutine Compatibility

A bigger thing was that I debugged coroutines and their interaction with uncompiled and compiled coroutines awaiting one another, and turns out, there was a lot to improve.

The next release will be much better compatible with asyncio module and its futures, esp with exceptions to cancel tasks passed along. That required to clone a lot of CPython generator code, due to how ugly they mess with bytecode instruction pointers in yield from on an uncompiled coroutine, as they don't work with send method unlike everything else has to.

PyLint Troubles

For PyLint, the 2.0.0 release found new things, but unfortunately for 2.0.1 there is a lot of regressions that I had to report. I fixed the versions of first PyLint, and now also Astroid, so Travis cannot suddenly start to fail due to a PyLint release finding new warnings.

Currently, if you make a PR on Github, a PyLint update will break it. And also the cron job on Travis that checks master.

As somebody pointed out, I am now using requires.io <https://requires.io/github/kayhayen/Nuitka/requirements/?branch=factory> to check for Nuitka dependencies. But since 1.9.2 is still needed for Python2, that kind of is bound to give alarms for now.

TODO solving

I have a habit of doing off tasks, when I am with my notebook in some place, and don't know what to work on. So I have some 2 hours recently like this, and used it to look at TODO and resolve them.

I did a bunch of cleanups for static code helpers. There was one in my mind about calling a function with a single argument. That fast call required a local array with one element to put the arg into. That makes using code ugly.

Issues Encountered

So the enum module of Python3 hates compiled classes and their staticmethod around __new__. Since it manually unwraps __new__ and then calls it itself, it then finds that a staticmethod object cannot be called. It's purpose is to sit in the class dictionary to give a descriptor that removes the self arg from the call.

I am contemplating submitting an upstream patch for CPython here. The hard coded check for PyFunction on the __new__ value is hard to emulate.

So I am putting the staticmethod into the dictionary passed already. But the undecorated function should be there for full compatibility.

If I were to make compiled function type that is both a staticmethod alike and a function, maybe I can work around it. But it's ugly and a burden. But it would need no change. And maybe there is more core wanting to call __new__ manually

Plans

I intend to make a release, probably this weekend. It might not contain full 3.7 compatibility yet, although I am aiming at that.

Then I want to turn to "goto generators", a scalability improvement of generators and coroutines that I will talk about next week then.

Until next week.

Nuitka Release 0.5.31

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is massive in terms of fixes, but also adds a lot of refinement to code generation, and more importantly adds experimental support for Python 3.7, while enhancing support for Pyt5 in standalone mode by a lot.

Bug Fixes

  • Standalone: Added missing dependencies for PyQt5.Qt module.

  • Plugins: Added support for PyQt5.Qt module and its qml plugins.

  • Plugins: The sensible plugin list for PyQt now includes that platforms plugins on Windows too, as they are kind of mandatory.

  • Python3: Fix, for uninstalled Python versions wheels that linked against the Python3 library as opposed to Python3X, it was not found.

  • Standalone: Prefer DLLs used by main program binary over ones used by wheels.

  • Standalone: For DLLs added by Nuitka plugins, add the package directory to the search path for dependencies where they might live.

  • Fix, the vars built-in didn't annotate its exception exit.

  • Python3: Fix, the bytes and complex built-ins needs to be treated as a slot too.

  • Fix, consider if del variable must be assigned, in which case no exception exit should be created. This prevented Tkinter compilation.

  • Python3.6: Added support for the following language construct:

    d = {"metaclass" : M}
    
    class C(**d):
       pass
    
  • Python3.5: Added support for cyclic imports. Now a from import with a name can really cause an import to happen, not just a module attribute lookup.

  • Fix, hasattr was never raising exceptions.

  • Fix, bytearray constant values were considered to be non-iterable.

  • Python3.6: Fix, now it is possible to del __annotations__ in a class and behave compatible. Previously in this case we were falling back to the module variable for annotations used after that which is wrong.

  • Fix, some built-in type conversions are allowed to return derived types, but Nuitka assumed the excact type, this affected bytes, int, long, unicode.

  • Standalone: Fix, the _socket module was insisted on to be found, but can be compiled in.

New Features

  • Added experimental support for Python 3.7, more work will be needed though for full support. Basic tests are working, but there are are at least more coroutine changes to follow.
  • Added support for building extension modules against statically linked Python. This aims at supporting manylinux containers, which are supposed to be used for creating widely usable binary wheels for Linux. Programs won't work with statically linked Python though.
  • Added options to allow ignoring the Windows cache for DLL dependencies or force an update.
  • Allow passing options from distutils to Nuitka compilation via setup options.
  • Added option to disable the DLL dependency cache on Windows as it may become wrong after installing new software.
  • Added experimental ability to provide extra options for Nuitka to setuptools.
  • Python3: Remove frame preservation and restoration of exceptions. This is not needed, but leaked over from Python2 code.

Optimization

  • Apply value tracing to local dict variables too, enhancing the optimization for class bodies and function with exec statements by a lot.
  • Better optimization for "must not have value", wasn't considering merge traces of uninitialized values, for which this is also the case.
  • Use 10% less memory at compile time due to specialized base classes for statements with a single child only allowing __slots__ usage by not having multiple inheritance for those.
  • More immediately optimize branches with known truth values, so that merges are avoided and do not prevent trace based optimization before the pass after the next one. In some cases, optimization based on traces could fail to be done if there was no next pass caused by other things.
  • Much faster handling for functions with a lot of eval and exec calls.
  • Static optimization of type with known type shapes, the value is predicted at compile time.
  • Optimize containers for all compile time constants into constant nodes. This also enables further compile time checks using them, e.g. with isinstance or in checks.
  • Standalone: Using threads when determining DLL dependencies. This will speed up the un-cached case on Windows by a fair bit.
  • Also remove unused assignments for mutable constant values.
  • Python3: Also optimize calls to bytes built-in, this was so far not done.
  • Statically optimize iteration over constant values that are not iterable into errors.
  • Removed Fortran, Java, LaTex, PDF, etc. stuff from the inline copies of Scons for faster startup and leaner code. Also updated to 3.0.1 which is no important difference over 3.0.0 for Nuitka however.
  • Make sure to always release temporary objects before checking for error exits. When done the other way around, more C code than necessary will be created, releasing them in both normal case and error case after the check.
  • Also remove unused assignments in case the value is a mutable constant.

Cleanups

  • Don't store "version" numbers of variable traces for code generation, instead directly use the references to the value traces instead, avoiding later lookups.
  • Added dedicated module for complex built-in nodes.
  • Moved C helpers for integer and complex types to dedicated files, solving the TODOs around them.
  • Removed some Python 3.2 only codes.

Organizational

  • For better bug reports, the --version output now contains also the Python version information and the binary path being used.
  • Started using specialized exceptions for some types of errors, which will output the involved data for better debugging without having to reproduce anything. This does e.g. output XML dumps of problematic nodes.
  • When encountering a problem (compiler crash) in optimization, output the source code line that is causing the issue.
  • Added support for Fedora 28 RPM builds.
  • Remove more instances of mentions of 3.2 as supported or usable.
  • Renovated the graphing code and made it more useful.

Summary

This release marks important progress, as the locals dictionary tracing is a huge step ahead in terms of correctness and proper optimization. The actual resulting dictionary is not yet optimized, but that ought to follow soon now.

The initial support of 3.7 is important. Right now it apparently works pretty well as a 3.6 replacement already, but definitely a lot more work will be needed to fully catch up.

For standalone, this accumulated a lot of improvements related to the plugin side of Nuitka. Thanks to those involved in making this better. On Windows things ought to be much faster now, due to parallel usage of dependency walker.

Nuitka Release 0.5.30

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release has improvements in all areas. Many bug fixes are accompanied with optimization changes towards value tracing.

Bug Fixes

  • Fix, the new setuptools runners were not used by pip breaking the use of Nuitka from PyPI.

  • Fix, imports of six.moves could crash the compiler for built-in names. Fixed in 0.5.29.2 already.

  • Windows: Make the nuitka-run not a symlink as these work really bad on that platform, instead make it a full copy just like we did for nuitka3-run already. Fixed in 0.5.29.2 already.

  • Python3.5: In module mode, types.coroutine was monkey patched into an endless recursion if including more than one module, e.g. for a package. Fixed in 0.5.29.3 already.

  • Python3.5: Dictionary unpackings with both star arguments and non star arguments could leak memory. Fixed in 0.5.29.3 already.

    c = {a : 1, **d}
    
  • Fix, distutils usage was not working for Python2 anymore, due to using super for what are old style classes on that version.

  • Fix, some method calls to C function members could leak references.

    class C:
       for_call = functools.partial
    
       def m():
          self.for_call() # This leaked a reference to the descriptor.
    
  • Python3.5: The bases classes should be treated as an unpacking too.

    class C(*D): # Allowed syntax that was not supported.
       pass
    
  • Windows: Added back batch files to run Nuitka from the command line. Fixed in 0.5.29.5 already.

New Features

  • Added option --include-package to force inclusion of a whole package with the submodules in a compilation result.
  • Added options --include-module to force inclusion of a single module in a compilation result.
  • The `multiprocessing plug-in got adapted to Python 3.4 changes and will now also work in accelerated mode on Windows.
  • It is now possible to specify the Qt plugin directories with e.g. --enable-plugin=qt_plugins=imageformats and have only those included. This should avoid dependency creep for shared libraries.
  • Plugins can now make the decision about recursing to a module or not.
  • Plugins now can get their own options passed.

Optimization

  • The re-raising of exceptions has gotten its own special node type. This aims at more readability (XML output) and avoiding the overhead of checking potential attributes during optimization.
  • Changed built-in int, long, and float to using a slot mechanism that also analyses the type shape and detects and warns about errors at compile time.
  • Changed the variable tracing to value tracing. This meant to cleanup all the places that were using it to find the variable.
  • Enable must have / must not value value optimization for all kinds of variables including module and closure variables. This often avoids error exits and leads to smaller and faster generated code.

Tests

  • Added burn test with local install of pip distribution to virtualenv before making any PyPI upload. It seems pip got its specific error sources too.
  • Avoid calling 2to3 and prefer <python> -m lib2to3 instead, as it seems at least Debian Testing stopped to provide the binary by default. For Python 2.6 and 3.2 we continue to rely on it, as the don't support that mode of operation.
  • The PyLint checks have been made more robust and even more Python3 portable.
  • Added PyLint to Travis builds, so PRs are automatically checked too.
  • Added test for distutils usage with Nuitka that should prevent regressions for this new feature and to document how it can be used.
  • Make coverage taking work on Windows and provide the full information needed, the rendering stage is not there working yet though.
  • Expanded the trick assignment test cases to cover more slots to find bugs introduced with more aggressive optimization of closure variables.
  • New test to cover multiprocessing usage.
  • Generating more code tests out of doctests for increased coverage of Nuitka.

Cleanups

  • Stop using --python-version in tests where they still remained.
  • Split the forms of int and long into two different nodes, they share nothing except the name. Create the constants for the zero arg variant more immediately.
  • Split the output comparison part into a dedicated testing module so it can be re-used, e.g. when doing distutils tests.
  • Removed dead code from variable closure taking.
  • Have a dedicated module for the metaclass of nodes in the tree, so it is easier to find, and doesn't clutter the node base classes module as much.
  • Have a dedicated node for reraise statements instead of checking for all the arguments to be non-present.

Organizational

  • There is now a pull request template for Github when used.
  • Deprecating the --python-version argument which should be replaced by using -m nuitka with the correct Python version. Outputs have been updated to recommend this one instead.
  • Make automatic import sorting and autoformat tools properly executable on Windows without them changing new lines.
  • The documentation was updated to prefer the call method with -m nuitka and manually providing the Python binary to use.

Summary

This release continued the distutils integration adding first tests, but more features and documentation will be needed.

Also, for the locals dictionary work, the variable tracing was made generic, but not yet put to use. If we use this to also trace dictionary keys, we can expect a lot of improvements for class code again.

The locals dictionary tracing will be the focus before resuming the work on C types, where the ultimate performance boost lies. However, currently, not the full compatibility has been achieved even with currently using dictionaries for classes, and we would like to be able to statically optimize those better anyway.