Nuitka Release 0.5.13

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains the first use of SSA for value propagation and massive amounts of bug fixes and optimization. Some of the bugs that were delivered as hotfixes, were only revealed when doing the value propagation as they still could apply to real code.

Bug Fixes

  • Fix, relative imports in packages were not working with absolute imports enabled via future flags. Fixed in 0.5.12.1 already.
  • Loops were not properly degrading knowledge from inside the loop at loop exit, and therefore this could have lead missing checks and releases in code generation for cases, for del statements in the loop body. Fixed in 0.5.12.1 already.
  • The or and and re-formulation could trigger false assertions, due to early releases for compatibility. Fixed in 0.5.12.1 already.
  • Fix, optimizion of calls of constant objects (always an exception), crashed the compiler. This corrects Issue#202. Fixed in 0.5.12.2 already.
  • Standalone: Added support for site.py installations with a leading def or class statement, which is defeating our attempt to patch __file__ for it. This corrects Issue#189.
  • Compatibility: In full compatibility mode, the tracebacks of or and and expressions are now as wrong as they are in CPython. Does not apply to --improved mode.
  • Standalone: Added missing dependency on QtGui by QtWidgets for PyQt5.
  • MacOS: Improved parsing of otool output to avoid duplicate entries, which can also be entirely wrong in the case of Qt plugins at least.
  • Avoid relative paths for main program with file reference mode original, as it otherwise changes as the file moves.
  • MinGW: The created modules depended on MinGW to be in PATH for their usage. This is no longer necessary, as we now link these libraries statically for modules too.
  • Windows: For modules, the option --run to immediately load the modules had been broken for a while.
  • Standalone: Ignore Windows DLLs that were attempted to be loaded, but then failed to load. This happens e.g. when both PySide and PyQt are installed, and could cause the dreaded conflicting DLLs message. The DLL loaded in error is now ignored, which avoids this.
  • MinGW: The resource file used might be empty, in which case it doesn't get created, avoiding an error due to that.
  • MinGW: Modules can now be created again. The run time relative code uses an API that is WinXP only, and MinGW failed to find it without guidance.

Optimization

  • Make direct calls out of called function creations. Initially this applies to lambda functions only, but it's expected to become common place in coming releases. This is now 20x faster than CPython.

    # Nuitka avoids creating a function object, parsing function arguments:
    (lambda x:x)(something)
    
  • Propagate assignments from non-mutable constants forward based on SSA information. This is the first step of using SSA for real compile time optimization.

  • Specialized the creation of call nodes at creation, avoiding to have all kinds be the most flexible form (keyword and plain arguments), but instead only what kind of call they really are. This saves lots of memory, and makes the tree faster to visit.

  • Added support for optimizing the slice built-in with compile time constant arguments to constants. The re-formulation for slices in Python3 uses these a lot. And the lack of this optimization prevented a bunch of optimization in this area. For Python2 the built-in is optimized too, but not as important probably.

  • Added support for optimizing isinstance calls with compile time constant arguments. This avoids static exception raises in the exec re-formulation which tests for file type, and then optimization couldn't tell that a str is not a file instance. Now it can.

  • Lower in-place operations on immutable types to normal operations. This will allow to compile time compute these more accurately.

  • The re-formulation of loops puts the loop condition as a conditional statement with break. The not that needs to apply was only added in later optimization, leading to unnecessary compile time efforts.

  • Removed per variable trace visit from optimization, removing useless code and compile time overhead. We are going to optimize things by making decision in assignment and reference nodes based on forward looking statements using the last trace collection.

New Features

  • Added experimental support for Python 3.5, which seems to be passing the test suites just fine. The new @ matrix multiplicator operators are not yet supported though.
  • Added support for patching source on the fly. This is used to work around a (now fixed) issue with numexpr.cpuinfo making type checks with the is operation, about the only thing we cannot detect.

Organizational

  • Added repository for Ubuntu Vivid (15.04) for download. Removed Ubuntu Saucy and Ubuntu Raring package downloads, these are no longer supported by Ubuntu.
  • Added repository for Debian Stretch, after Jessie release.
  • Make it more clear in the documentation that in order to compile Python3, a Python2 is needed to execute Scons, but that the end result is a Python3 binary.
  • The PyLint checker tool now can operate on directories given on the command line, and whitelists an error that is Windows only.

Cleanups

  • Split up standalone code further, moving depends.exe handling to a separate module.
  • Reduced code complexity of scons interface.
  • Cleaned up where trace collection is being done. It was partially still done inside the collection itself instead in the owner.
  • In case of conflicting DLLs for standalone mode, these are now output with nicer formatting, that makes it easy to recognize what is going on.
  • Moved code to fetch depends.exe to dedicated module, so it's not as much in the way of standalone code.

Tests

  • Made BuiltinsTest directly executable with Python3.
  • Added construct test to demonstrate the speed up of direct lambda calls.
  • The deletion of @test for the CPython test suite is more robust now, esp. on Windows, the symbolic links are now handled.
  • Added test to cover or usage with in-place assignment.
  • Cover local relative import from . with absolute_import future flag enabled.
  • Again, more basic tests are now directly executable with Python3.

Summary

This release is major due to amount of ground covered. The reduction in memory usage of Nuitka itself (the C++ compiler will still use much memory) is very massive and an important aspect of scalability too.

Then the SSA changes are truly the first sign of major improvements to come. In their current form, without eliminating dead assignments, the full advantage is not taken yet, but the next releases will do this, and that's a major milestone to Nuitka.

The other optimization mostly stem from looking at things closer, and trying to work towards function in-lining, for which we are making a lot of progress now.

Nuitka Progress in Spring 2015

It's absolutely time to speak about what's going on with Nuitka, there have been a few releases, and big things are going to happen now. The ones I have always talked of, it's happening now.

I absolutely prefer to talk of things when they are completed, that is why I am shy to make these kinds of postings, but this time, I think it's warranted. The next couple of releases are going to be very different.

SSA (Single State Assignment Form)

For a long, long time already, each release of Nuitka has worked towards increasing "SSA" usage in Nuitka.

The component that works on this, is now called "trace collection", and does the major driving part for optimization. It collects "variable traces" and puts them together into "global" forms as well.

Based on these traces, optimizations can be made. Having SSA or not, is (to me) the difference between Nuitka as a mere compiler, and Nuitka as an optimizing compiler.

The major news is that factory versions of Nuitka now do this in serious ways, propagating values forward, and we also are close to eliminating dead assignments, some of which become dead by being having been forward propagated.

So we can now finally see that big step, jump really, happening, and Nuitka does now do some pretty good static optimization, at least locally.

Still, right now, this trival code assigns to a local variable, then reads from it to return. But not for much longer.

def f():
    a = 1
    return a

This is going to instantly give performance gains, and more importantly, will enable analysis, that leads to avoiding e.g. the creation of function objects for local functions, becoming able to in-line, etc.

This is major excitement to me. And I cannot wait to have the releases that do this.

Scalability

The focus has also been lately, to reduce Nuitka's own memory usage. It has gone down by a large factor, often by avoiding cyclic dependencies in the data structures, that the garbage collector of Python failed to deal with properly.

The scalability of Nuitka also depends much on generated code size. With the optimization become more clever, less code needs to be generated, and that will help a lot. On some platforms, MSVC most notably, it can be really slow, but it's noteworthy that Nuitka works not just with 2008 edition, but with the lastest MSVC, which appears to be better.

Compatibility

There was not a whole lot to gain in the compatibility domain anymore. Nothing important certainly. But there are import changes.

Python 3.5

The next release has changes to compile and run the Python3.4 test suite successfully. Passing here means, to pass/fail in the same way as does the uncompiled Python. Failures are of course expected, and a nice way of having coverage for exception codes.

The new @ operator is not supported yet. I will wait with that for things to stabilize. It's currently only an alpha release.

However, Nuitka has probably never been this close to supporting a new Python version at release time. And since 3.4 was such a heavy drain, and still not perfectly handled (super still works like it's 3.3 e.g.), I wanted to know what is coming a bit sooner.

Cells for Closure

We now provide a __closure__ value for compiled functions too. These are not writable in Python, so it's only a view. Having moved storage into the compiled function object, that was easy.

Importing Enhancements

The the past couple of releases, the import logic was basically re-written with compatibility much increased. The handling of file case during import, multiple occurrences in the path, and absolute import future flags for relative imports has been added.

It's mainly the standalone community that will have issues, when just one of these imports doesn't find the correct thing, but picking the wrong one will of course have seriously bad impacts on compile time analysis too. So once we do cross module optimization, this must be rock solid.

I think we have gotten there, tackling these finer details now too.

Performance

Graphs and Benchmarks

Nuitka, users don't know what to expect regarding the speed of their code after compilation through Nuitka, neither now nor after type inference (possibly hard to guess). Nuitka does a bunch of optimizations for some constructs pretty heavily, but weak at others. But how much does that affect real code?

There may well be no significant gain at all for many people, while there is a number for PyStone that suggests higher. The current and future versions possibly do speed up but the point is that you cannot tell if it is even worth for someone to try.

Nuitka really has to catch up here. The work on automated performance graphs has some made progress, and they are supposed to show up on Nuitka Speedcenter each time, master, develop or factory git branches change.

Note

There currently is no structure to these graphs. There is no explanations or comments, and there is no trend indicators. All of which makes it basically useless to everybody except me. And even harder for me than necessary.

However, as a glimpse of what will happen when we in-line functions, take a look at the case, where we already eliminate parameter parsing only, and make tremendous speedups:

Lambda call construct case

Right now (the graph gets automatic updates with each change), what you should see, is that develop branch is 20 times faster than CPython for that very specific bit of code. That is where we want to be, except that with actually in-line, this will of course be even better.

It's artifical, but once we can forward propagate local function creations, it will apply there too. The puzzle completes.

But we also need to put real programs and use cases to test. This may need your help. Let me know if you want to.

Standalone

The standalone mode of Nuitka is pretty good, and as usual it continued to improve only.

Nothing all that important going on there, except the work on a plug-in framework, which is under development, and being used to handle e.g. PyQt plug-ins, or known issues with certain packages.

The importing improvements already mentioned, have now allowed to cover many more libraries successfully than before.

Other Stuff

Debian Stable

Nuitka is now part of Debian stable, aka Jessie. Debian and Python are the two things closest to my heart in the tech field. You can imagine that being an upstream worthy of inclusion into Debian stable is an import milestone to Nuitka for me.

Funding

Nuitka receives the occasional donation and those make me very happy. As there is no support from organization like the PSF, I am all on my own there.

This year I likely will travel to Europython 2015, and would ask you to support me with that, it's going to be expensive.

EuroPython 2015

I have plans to present Nuitka's function in-lining there, real stuff, on a fully and functional compiler that works as a drop-in replacement.

Not 100% sure if I can make it by the time, but things look good. Actually so far I felt ahead of the plan, but as you know, this can easily change at any point. But Nuitka stands on very stable grounds code wise.

Collaborators

Things are coming along nicely. When I started out, I was fully aware that the project is something that I can do on my own if necessary, and that has not changed. Things are going slower than necessary though, but that's probably very typical.

But you can join and should do so now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Nuitka is about to make break through progress. And you can be a part of it. Now.

Future

So, there is multiple things going on:

  • More SSA usage

    The next releases are going to be all about getting this done.

    Once we take it to that next level, Nuitka will be able to speed up some things by much more than the factor it basically has provided for 2 years now, and it's probably going to happen long before EuroPython 2015.

  • Function in-lining

    For locally declared functions, it should become possible to avoid their creation, and make direct calls instead of ones that use function objects and expensive parameter handling.

    The next step there of course is to not only bind the arguments to the function signature, but then also to in-line and potentially specialize the function code. It's my goal to have that at EuroPython 2015 in a form ready to show off.

When these 2 things come to term, Nuitka will have made really huge steps ahead and layed the ground for success.

From then on, a boatload of work remains. The infrastructure in place, still there is going to be plenty of work to optimize more and more things conretely, and to e.g. do type inference, and generate different codes for booleans, ints or float values.

Let me know, if you are willing to help. I really need that help to make things happen faster. Nuitka will become more and more important only.

Nuitka Release 0.5.12

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains massive amounts of corrections for long standing issues in the import recursion mechanism, as well as for standalone issues now visible after the __file__ and __path__ values have changed to become runtime dependent values.

Bug Fixes

  • Fix, the __path__ attribute for packages was still the original filename's directory, even in file reference mode was runtime.

  • The use of runtime as default file reference mode for executables, even if not in standalone mode, was making acceleration harder than necessary. Changed to original for that case. Fixed in 0.5.11.1 already.

  • The constant value for the smallest int that is not yet a long is created using 1 due to C compiler limitations, but 1 was not yet initialized properly, if this was a global constant, i.e. used in multiple modules. Fixed in 0.5.11.2 already.

  • Standalone: Recent fixes around __path__ revealed issues with PyWin32, where modules from win32com.shell were not properly recursed to. Fixed in 0.5.11.2 already.

  • The importing of modules with the same name as a built-in module inside a package falsely assumed these were the built-ins which need not exist, and then didn't recurse into them. This affected standalone mode the most, as the module was then missing entirely. This corrects Issue#178.

    # Inside "x.y" module:
    import x.y.exceptions
    
  • Similarily, the importing of modules with the same name as standard library modules could go wrong. This corrects Issue#184.

    # Inside "x.y" module:
    import x.y.types
    
  • Importing modules on Windows and MacOS was not properly checking the checking the case, making it associate wrong modules from files with mismatching case. This corrects Issue#188.

  • Standalone: Importing with from __future__ import absolute_import would prefer relative imports still. This corrects Issue#187.

  • Python3: Code generation for try/return expr/finally could loose exceptions when expr raised an exception, leading to a RuntimeError for NULL return value. The real exception was lost.

  • Lambda expressions that were directly called with star arguments caused the compiler to crash.

    (lambda *args:args)(*args) # was crashing Nuitka
    

New Optimization

  • Focusing on compile time memory usage, cyclic dependencies of trace merges that prevented them from being released, even when replaced were removed.
  • More memory efficient updating of global SSA traces, reducing memory usage during optimization by ca. 50%.
  • Code paths that cannot and therefore must not happen are now more clearly indicated to the backend compiler, allowing for slightly better code to be generated by it, as it can tell that certain code flows need not be merged.

New Features

  • Standalone: On systems, where .pth files inject Python packages at launch, these are now detected, and taking into account. Previously Nuitka did not recognize them, due to lack of __init__.py files. These are mostly pip installations of e.g. zope.interface.
  • Added option --explain-imports to debug the import resolution code of Nuitka.
  • Added options --show-memory to display the amount of memory used in total and how it's spread across the different node types during compilation.
  • The option --trace-execution now also covers early program initialisation before any Python code runs, to ease finding bugs in this domain as well.

Organizational

  • Changed default for file reference mode to original unless standalone or module mode are used. For mere acceleration, breaking the reading of data files from __file__ is useless.
  • Added check that the inline copy of scons is not run with Python3, which is not supported. Nuitka works fine with Python3, but a Python2 is required to execute scons.
  • Discover more kinds of Python2 installations on Linux/MacOS installations.
  • Added instructions for MacOS to the download page.

Cleanups

  • Moved oset and odict modules which provide ordered sets and dictionaries into a new package nuitka.container to clean up the top level scope.
  • Moved SyntaxErrors to nuitka.tree package, where it is used to format error messages.
  • Moved nuitka.Utils package to nuitka.utils.Utils creating a whole package for utils, so as to better structure them for their purpose.

Summary

This release is a major maintenance release. Support for namespace modules injected by *.pth is a major step for new compatibility. The import logic improvements expand the ability of standalone mode widely. Many more use cases will now work out of the box, and less errors will be found on case insensitive systems.

There is aside of memory issues, no new optimization though as many of these improvements could not be delivered as hotfixes (too invasive code changes), and should be out to the users as a stable release. Real optimization changes have been postponed to be next release.

Nuitka Release 0.5.11

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

The last release represented a significant change and introduced a few regressions, which got addressed with hot fix releases. But it also had a focus on cleaning up open optimization issues that were postponed in the last release.

New Features

  • The filenames of source files as found in the __file__ attribute are now made relative for all modes, not just standalone mode.

    This makes it possible to put data files along side compiled modules in a deployment. This solves Issue#170.

Bug Fixes

  • Local functions that reference themselves were not released. They now are.

    def someFunction():
        def f():
            f() # referencing 'f' in 'f' caused the garbage collection to fail.
    

    Recent changes to code generation attached closure variable values to the function object, so now they can be properly visited. This corrects Issue#45. Fixed in 0.5.10.1 already.

  • Python2.6: The complex constants with real or imaginary parts -0.0 were collapsed with constants of value 0.0. This became more evident after we started to optimize the complex built-in. Fixed in 0.5.10.1 already.

    complex(0.0, 0.0)
    complex(-0.0, -0.0) # Could be confused with the above.
    
  • Complex call helpers could leak references to their arguments. This was a regression. Fixed in 0.5.10.1 already.

  • Parameter variables offered as closure variables were not properly released, only the cell object was, but not the value. This was a regression. Fixed in 0.5.10.1 already.

  • Compatibility: The exception type given when accessing local variable values not initialized in a closure taking function, needs to be NameError and UnboundLocalError for accesses in the providing function. Fixed in 0.5.10.1 already.

  • Fix support for "venv" on systems, where the system Python uses symbolic links too. This is the case on at least on Mageia Linux. Fixed in 0.5.10.2 already.

  • Python3.4: On systems where long and Py_ssize_t are different (e.g. Win64) iterators could be corrupted if used by uncompiled Python code. Fixed in 0.5.10.2 already.

  • Fix, generator objects didn't release weak references to them properly. Fixed in 0.5.10.2 already.

  • Compatiblity: The __closure__ attributes of functions was so far not supported, and rarely missing. Recent changes made it easy to expose, so now it was added. This corrects Issue#45.

  • MacOS: A linker warning about deprecated linker option -s was solved by removing the option.

  • Compatibility: Nuitka was enforcing that the __doc__ attribute to be a string object, and gave a misleading error message. This check must not be done though, __doc__ can be any type in Python. This corrects Issue#177.

New Optimization

  • Variables that need not be shared, because the uses in closure taking functions were eliminated, no longer use cell objects.

  • The try/except and try/finally statements now both have actual merging for SSA, allowing for better optimization of code behind it.

    def f():
    
        try:
            a = something()
        except:
            return 2
    
        # Since the above exception handling cannot continue the code flow,
        # we do not have to invalidate the trace of "a", and e.g. do not have
        # to generate code to check if it's assigned.
        return a
    

    Since try/finally is used in almost all re-formulations of complex Python constructs this is improving SSA application widely. The uses of try/except in user code will no longer degrade optimization and code generation efficiency as much as they did.

  • The try/except statement now reduces the scope of tried block if possible. When no statement raised, already the handling was removed, but leading and trailing statements that cannot raise, were not considered.

    def f():
    
        try:
            b = 1
            a = something()
            c = 1
        except:
            return 2
    

    This is now optimized to.

    def f():
    
        b = 1
        try:
            a = something()
        except:
            return 2
        c = 1
    

    The impact may on execution speed may be marginal, but it is definitely going to improve the branch merging to be added later. Note that c can only be optimized, because the exception handler is aborting, otherwise it would change behaviour.

  • The creation of code objects for standalone mode and now all code objects was creating a distinct filename object for every function in a module, despite them being same content. This was wasteful for module loading. Now it's done only once.

    Also, when having multiple modules, the code to build the run time filename used for code objects, was calling import logic, and doing lookups to find os.path.join again and again. These are now cached, speeding up the use of many modules as well.

Cleanups

  • Nuitka used to have "variable usage profiles" and still used them to decide if a global variable is written to, in which case, it stays away from doing optimization of it to built-in lookups, and later calls.

    The have been replaced by "global variable traces", which collect the traces to a variable across all modules and functions. While this is now only a replacement, and getting rid of old code, and basing on SSA, later it will also allow to become more correct and more optimized.

  • The standalone now queries its hidden dependencies from a plugin framework, which will become an interface to Nuitka internals in the future.

Testing

  • The use of deep hashing of constants allows us to check if constants become mutated during the run-time of a program. This allows to discover corruption should we encounter it.
  • The tests of CPython are now also run with Python in debug mode, but only on Linux, enhancing reference leak coverage.
  • The CPython test parts which had been disabled due to reference cycles involving compiled functions, or usage of __closure__ attribute, were reactivated.

Organizational

  • Since Google Code has shutdown, it has been removed from the Nuitka git mirrors.

Summary

This release brings exciting new optimization with the focus on the try constructs, now being done more optimal. It is also a maintenance release, bringing out compatibility improvements, and important bug fixes, and important usability features for the deployment of modules and packages, that further expand the use cases of Nuitka.

The git flow had to be applied this time to get out fixes for regression bug fixes, that the big change of the last release brought, so this is also to consolidate these and the other corrections into a full release before making more invasive changes.

The cleanups are leading the way to expanded SSA applied to global variable and shared variable values as well. Already the built-in detect is now based on global SSA information, which was an important step ahead.

Nuitka Release 0.5.10

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release has a focus on code generation optimization. Doing major changes away from "C++-ish" code to "C-ish" code, many constructs are now faster or got looked at and optimized.

Bug Fixes

  • Compatibility: The variable name in locals for the iterator provided to the generator expression should be .0, now it is.
  • Generators could leak frames until program exit, these are now properly freed immediately.

New Optimization

  • Faster exception save and restore functions that might be in-lined by the backend C compiler.

  • Faster error checks for many operations, where these errors are expected, e.g. instance attribute lookups.

  • Do not create traceback and locals dictionary for frame when StopIteration or GeneratorExit are raised. These tracebacks were wasted, as they were immediately released afterwards.

  • Closure variables to functions and parameters of generator functions are now attached to the function and generator objects.

  • The creation of functions with closure taking was accelerated.

  • The creation and destruction of generator objects was accelerated.

  • The re-formulation for in-place assignments got simplified and got faster doing so.

  • In-place operations of str were always copying the string, even if was not necessary. This corrects Issue#124.

    a += b # Was not re-using the storage of "a" in case of strings
    
  • Python2: Additions of int for Python2 are now even faster.

  • Access to local variable values got slightly accelerated at the expense of closure variables.

  • Added support for optimizing the complex built-in.

  • Removing unused temporary and local variables as a result of optimization, these previously still allocated storage.

Cleanup

  • The use of C++ classes for variable objects was removed. Closure variables are now attached as PyCellObject to the function objects owning them.
  • The use of C++ context classes for closure taking and generator parameters has been replaced with attaching values directly to functions and generator objects.
  • The indentation of code template instantiations spanning multiple was not in all cases proper. We were using emission objects that handle it new lines in code and mere list objects, that don't handle them in mixed forms. Now only the emission objects are used.
  • Some templates with C++ helper functions that had no variables got changed to be properly formatted templates.
  • The internal API for handling of exceptions is now more consistent and used more efficiently.
  • The printing helpers got cleaned up and moved to static code, removing any need for forward declaration.
  • The use of INCREASE_REFCOUNT_X was removed, it got replaced with proper Py_XINCREF usages. The function was once required before "C-ish" lifted the need to do everything in one function call.
  • The use of INCREASE_REFCOUNT got reduced. See above for why that is any good. The idea is that Py_INCREF must be good enough, and that we want to avoid the C function it was, even if in-lined.
  • The assertObject function that checks if an object is not NULL and has positive reference count, i.e. is sane, got turned into a preprocessor macro.
  • Deep hashes of constant values created in --debug mode, which cover also mutable values, and attempt to depend on actual content. These are checked at program exit for corruption. This may help uncover bugs.

Organizational

  • Speedcenter has been enhanced with better graphing and has more benchmarks now. More work will be needed to make it useful.
  • Updates to the Developer Manual, reflecting the current near finished state of "C-ish" code generation.

Tests

  • New reference count tests to cover generator expressions and their usage got added.
  • Many new construct based tests got added, these will be used for performance graphing, and serve as micro benchmarks now.
  • Again, more basic tests are directly executable with Python3.

Summary

This is the next evolution of "C-ish" coming to pass. The use of C++ has for all practical purposes vanished. It will remain an ongoing activity to clear that up and become real C. The C++ classes were a huge road block to many things, that now will become simpler. One example of these were in-place operations, which now can be dealt with easily.

Also, lots of polishing and tweaking was done while adding construct benchmarks that were made to check the impact of these changes. Here, generators probably stand out the most, as some of the missed optimization got revealed and then addressed.

Their speed increases will be visible to some programs that depend a lot on generators.

This release is clearly major in that the most important issues got addressed, future releases will provide more tuning and completeness, but structurally the "C-ish" migration has succeeded, and now we can reap the benefits in the coming releases. More work will be needed for all in-place operations to be accelerated.

More work will be needed to complete this, but it's good that this is coming to an end, so we can focus on SSA based optimization for the major gains to be had.

Nuitka progress 2014

Again, not much has happened publicly to Nuitka, except for some releases, so it's time to make a kind of status post, about the really exciting news there is, also looking back at 2014 for Nuitka, and forward of course.

I meant to post this basically since last year, but never got around to it, therefore the 2014 in the title.

SSA (Single State Assignment Form)

For a long, long time already, each release of Nuitka has worked towards enabling "SSA" usage in Nuitka. There is a component called "constraint collection", which is tasked with driving the optimization, and collecting variable traces.

Based on these traces, optimizations can be made. Having SSA or not, is (to me) the difference between Nuitka as a compiler, and Nuitka as an optimizing compiler.

The news is, SSA has carried the day, and is used throughout code generation for some time now, and gave minor improvements. It has been applied to the temporary and local variable values.

And currently, work is on the way to expand it to module and shared variables, which can get invalidated quite easily, as soon as unknown code is executed. An issue there is to identify all those spots reliably.

And this spring, we are finally going to see the big jump that is happening, once Nuitka starts to use that information to propagate things.

Still, right now, this code assigns to a local variable, then reads from it to return. But not much longer.

def f():
    a = 1
    return a

This is going to instantly give gains, and more importantly, will enable analysis, that leads to avoiding e.g. the creation of function objects for local functions, being able to in-line, etc.

Improved Code Generation

Previously, under the title "C-ish", Nuitka moved away from C++ based code generation to less C++ based code generated, and more C-ish code. This trend continues, and has lead to removing more code generation improvements.

The important change recently was to remove the usage of the blocking holdouts, the C++ classes used for local variables are closure taking, and release, and move those to be done manually.

This enabled special code generation for in-place operations, which are the most significant improvements of the upcoming release. These were held back on, as with C++ destructors doing the release, it's practically impossible to deal with values suddenly becoming illegal. Transfer of object ownership needs to be more fluid than could be presented to C++ objects.

Currently, this allows to speed up string in-place operations, which very importantly then, can avoid to memcpy large values potentially. And this is about catching up to CPython in this regard. After that, we will likely be able to expand it to cases where CPython could never do it, e.g. also int objects

Scalability

The scalability of Nuitka depends much on generated code size. With it being less stupid, the generated code is now not only faster, but definitely smaller, and with more optimization, it will only become more practical.

Removing the many C++ classes already gave the backend compiler an easier time. But we need to do more, to e.g. have generic parameter parsing, instead of specialized per function, and module exclusive constants should not be pre-created, but in the module body, when they are used.

Compatibility

There is not a whole lot to gain in the compatibility domain anymore. Nothing important certainly. But there are these minor things.

Cells for Closure

However, since we now use PyCell objects for closure, we could start and provide a real __closure__ value, that could even be writable. We could start supporting that easily.

Local Variable Storage

Currently, local variables use stack storage. Were we to use function object or frame object attached storage, we could provide frame locals that actually work. This may be as simple as to put those in an array on the stack and use the pointer to it.

Suddenly locals would become writable. I am not saying this is useful, just that it's possible to do this.

Performance

Graphs and Benchmarks

The work on automated performance graphs has made progress, and they are supposed to show up on Nuitka Speedcenter each time, master, develop or factory git branches change.

There currently is no structure to these graphs. There is no explanations or comments, and there is no trend indicators. All of which makes it basically useless to everybody except me. And even harder for me than necessary.

At least it's updated to latest Nikola, and uses PyGal for the graphics now, so it's easier to expand. The plan here, is to integrate with special pages from a Wiki, making it easy to provide comments.

Standalone

The standalone mode of Nuitka is pretty good, and as usual it continued to improve only.

The major improvements came from handling case collisions between modules and packages. One can have Module.py and module/__init__.py and they both are expected to be different, even on Windows, where filenames are case insenstive.

So, giving up on implib and similar, we finally have our own code to scan in a compatible way the file system, and make these determinations, whereas library code exposing functionality, doesn't handling all things in really the proper way.

Other Stuff

Funding

Nuitka receives some, bit not quite enough donations. There is no support from organizations like e.g. the PSF, and it seems I better not hold my breath for it. I will travel to Europython 2015, and would ask you to support me with that, it's going to be expensive.

In 2014, with donations, I bought a "Cubox i4-Pro", which is an ARM based machine with 4 cores, and 2GB RAM. Works from flash, and with the eSATA disk attached, it works nice for continous integration, which helps me a lot to deliver extremely high quality releases. It's pretty nice, except that when using all 4 cores, it gets too hot. So "systemd" to the rescue and just limited the Buildbot slave's service to use 3 cores of CPU maximum and now it runs stable.

Also with donations I bought a Terrabyte SSD, which I use on the desktop to speed up hosting the virtual machines, and my work in general.

And probably more important, the host of "nuitka.net" became a real machine with real hardware last year, and lots more RAM, so I can spare myself of optimizing e.g. MySQL for low memory usage. The monthly fee of that is substantial, but supported from your donations. Thanks a lot!

Collaborators

Things are coming along nicely. When I started out, I was fully aware that the project is something that I can do on my own if necessary, and that has not changed. Things are going slower than necessary though, but that's probably very typical.

But you can join and should do so now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Nuitka is about to make break through progress. And you can be a part of it. Now.

Future

So, there is multiple things going on:

  • More "C-ish" code generation

    The next release is going to be more "C-ish" than before, and we can start to actually migrate to really "C" language. You can help out if you want to, this is fairly standard cleanups. Just pop up on the mailing list and say so.

    This prong of action is coming to a logical end. The "C-ish" project, while not planned from the outset, turns out to be a full success. Initially, I would not have started Nuitka, should I have faced the full complexity of code generation that there is now. So it was good to start with "C++", but it's a better Nuitka now.

  • More SSA usage

    The previous releases consolidated on SSA. A few missing optimizations were found, because SSA didn't realize things, which were then highlighted by code generation being too good, e.g. not using exception variables.

    We seem to have an SSA that can be fully trusted now, and while it can be substantially improved (e.g. the try/finally removes all knowledge, although it only needs to do a partial removing of knowledge for the finally block, not for afterwards at all), it will already allow for many nice things to happen.

    Once we take it to that next level, Nuitka will be able to speed up some things by much more than the factor it basically has provided for 2 years now, and it's probably going to happen before summer, or so I hope.

  • Value propagation

    Starting out with simple cases, Nuitka will forward propagate variable values, and start to eliminate variable usages entirely, where they are not needed.

    That will make many things much more compact, and faster at run time. We will then try and build "gates" for statements that they cannot pass, so we can e.g. optimize constant things outside of loops, that kind of thing.

When these 3 things come to term, Nuitka will make a huge step ahead. I look forward to demoing function call in-lining, or at least avoiding the argument parsing at EuroPython 2015, making direct calls, which will be way faster than normal calls.

From then on, a boatload of work remains. The infrastructure in place, still there is going to be plenty of work to optimize more and more things conretely.

Let me know, if you are willing to help. I really need that help to make things happen faster.

Nuitka Release 0.5.9

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is mostly a maintenance release, bringing out minor compatibility improvements, and some standalone improvements. Also new options to control the recursion into modules are added.

Bug Fixes

  • Compatibility: Checks for iterators were using PyIter_Check which is buggy when running outside of Python core, because it's comparing pointers we don't see. Replaced with HAS_ITERNEXT helper which compares against the pointer as extracting for a real non-iterator object.

    class Iterable:
        def __init__(self):
            self.consumed = 2
    
        def __iter__(self):
            return Iterable()
    
    iter(Iterable()) # This is suppose to raise, but didn't with Nuitka
    
  • Python3: Errors when creating class dictionaries raised by the __prepare__ dictionary (e.g. enum classes with wrong identifiers) were not immediately raised, but only by the type call. This was not observable, but might have caused issues potentially.

  • Standalone MacOS: Shared libraries and extension modules didn't have their DLL load paths updated, but only the main binary. This is not sufficient for more complex programs.

  • Standalone Linux: Shared libraries copied into the .dist folder were read-only and executing chrpath could potentially then fail. This has not been observed, but is a conclusion of MacOS fix.

  • Standalone: When freezing standard library, the path of Nuitka and the current directory remained in the search path, which could lead to looking at the wrong files.

Organizational

  • The getattr built-in is now optimized for compile time constants if possible, even in the presence of a default argument. This is more a cleanup than actually useful yet.
  • The calling of PyCFunction from normal Python extension modules got accelerated, especially for the no or single argument cases where Nuitka now avoids building the tuple.

New Features

  • Added the option --recurse-pattern to include modules per filename, which for Python3 is the only way to not have them in a package automatically.

  • Added the option --generate-c++-only to only generate the C++ source code without starting the compiler.

    Mostly used for debugging and testing coverage. In the later case we do not want the C++ compiler to create any binary, but only to measure what would have been used.

Organizational

  • Renamed the debug option --c++-only to --recompile-c++-only to make its purpose more clear and there now is --generate-c++-only too.

Tests

  • Added support for taking coverage of Nuitka in a test run on a given input file.
  • Added support for taking coverage for all Nuitka test runners, migrating them all to common code for searching.
  • Added uniform way of reporting skipped tests, not generally used yet.

Summary

This release marks progress towards having coverage testing. Recent releases had made it clear that not all code of Nuitka is actually used at least once in our release tests. We aim at identifying these.

Another direction was to catch cases, where Nuitka leaks exceptions or is subject to leaked exceptions, which revealed previously unnoticed errors.

Important changes have been delayed, e.g. the closure variables will not yet use C++ objects to share storage, but proper PyCellObject for improved compatibility, and to approach a more "C-ish" status. These is unfinished code that does this. And the forward propagation of values is not enabled yet again either.

So this is an interim step to get the bug fixes and improvements accumulated out. Expect more actual changes in the next releases.

Nuitka Release 0.5.8

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release has mainly a focus on cleanups and compatibility improvements. It also advances standalone support, and a few optimization improvements, but it mostly is a maintenance release, attacking long standing issues.

Bug Fixes

  • Compatibility Windows MacOS: Fix importing on case insensitive systems.

    It was not always working properly, if there was both a package Something and something, by merit of having files Something/__init__.py and something.py.

  • Standalone: The search path was preferring system directories and therefore could have conflicting DLLs. Issue#144.

  • Fix, the optimization of getattr with predictable result was crashing the compilation. This was a regression, fixed in 0.5.7.1 already.

  • Compatibility: The name mangling inside classes also needs to be applied to global variables.

  • Fix, proving clang++ for CXX was mistakingly thinking of it as a g++ and making version checks on it.

  • Python3: Declaring __class__ global is now a SyntaxError before Python3.4.

  • Standalone Python3: Making use of module state in extension modules was not working properly.

New Features

  • The filenames of source files as found in the __file__ attribute are now made relative in standalone mode.

    This should make it more apparent if things outside of the distribution folder are used, at the cost of tracebacks. Expect the default ability to copy the source code along in an upcoming release.

  • Added experimental standalone mode support for PyQt5. At least headless mode should be working, plug-ins (needed for anything graphical) are not yet copied and will need more work.

Cleanup

  • No longer using imp.find_module anymore. To solve the casing issues we needed to make our own module finding implementation finally.
  • The name mangling was handled during code generation only. Moved to tree building instead.
  • More code generation cleanups. The compatible line numbers are now attached during tree building and therefore better preserved, as well as that code no longer polluting code generation as much.

Organizational

  • No more packages for openSUSE 12.1/12.2/12.3 and Fedora 17/18/19 as requested by the openSUSE Build Service.
  • Added RPM packages for Fedora 21 and CentOS 7 on openSUSE Build Service.

Tests

  • Lots of test refinements for the CPython test suites to be run continuously in Buildbot for both Windows and Linux.

Summary

This release brings about two major changes, each with the risk to break things.

One is that we finally started to have our own import logic, which has the risk to cause breakage, but apparently currently rather improved compatibility. The case issues were not fixable with standard library code.

The second one is that the __file__ attributes for standalone mode is now no longer pointing to the original install and therefore will expose missing stuff sooner. This will have to be followed up with code to scan for missing "data" files later on.

For SSA based optimization, there are cleanups in here, esp. the one removing the name mangling, allowing to remove special code for class variables. This makes the SSA tree more reliable. Hope is that the big step (forward propagation through variables) can be made in one of the next releases.

Article about Nuitka Standalone Mode

There is a really well written article about Nuitka written by Tom Sheffler.

It inspired me to finally become clean with __file__ attributes in standalone mode. Currently it points to where your source was when things were compiled. In the future (in standalone mode, for accelerated mode that continues to be good), it will point into the .dist folder, so that the SWIG workaround may become no longer necessary.

Thanks Tom for sharing your information, and good article.

Yours, Kay

Nuitka Release 0.5.7

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is brings a newly supported platform, bug fixes, and again lots of cleanups.

Bug Fixes

  • Fix, creation of dictionary and set literals with non-hashable indexes did not raise an exception.

    {[]: None} # This is now a TypeError
    

New Optimization

  • Calls to the dict built-in with only keyword arguments are now optimized to mere dictionary creations. This is new for the case of non-constant arguments only of course.

    dict(a = b, c = d)
    # equivalent to
    {"a" : b, "c" : d}
    
  • Slice del with indexable arguments are now using optimized code that avoids Python objects too. This was already done for slice look-ups.

  • Added support for bytearray built-in.

Organizational

  • Added support for OpenBSD with fiber implementation from library, as it has no context support.

Cleanups

  • Moved slicing solutions for Python3 to the re-formulation stage. So far the slice nodes were used, but only at code generation time, there was made a distinction between Python2 and Python3 for them. Now these nodes are purely Python2 and slice objects are used universally for Python3.

Tests

  • The test runners now have common code to scan for the first file to compile, an implementation of the search mode. This will allow to introduce the ability to search for pattern matches, etc.
  • More tests are directly executable with Python3.
  • Added recurse_none mode to test comparison, making using extra options for that purpose unnecessary.

Summary

This solves long standing issues with slicing and subscript not being properly distinguished in the Nuitka code. It also contains major bug fixes that really problematic. Due to the involved nature of these fixes they are made in this new release.