Nuitka on Podcast __init__

So, if you want to get to know Nuitka or me, there is a really good interview from the popular podcast "__init__" online.

Kay Hayen on Nuitka

I think it's very good at explaining things, putting it into perspective, history, future, and generally getting to know what kind of person I am.

So, this is probably as good as it ever gets. So please share, like and go ahead and spread it in this social media craziness that you use all day.

Yours, Kay

Nuitka Release 0.5.15

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release enables SSA based optimization, the huge leap, not so much in terms of actual performance increase, but for now making the things possible that will allow it.

This has been in the making literally for years. Over and over, there was just "one more thing" needed. But now it's there.

The release includes much stuff, and there is a perspective on the open tasks in the summary, but first out to the many details.

Bug Fixes

  • Standalone: Added implicit import for reportlab package configuration dynamic import. Fixed in 0.5.14.1 already.

  • Standalone: Fix, compilation of the ctypes module could happen for some import patterns, and then prevented the distribution to contain all necessary libraries. Now it is made certain to not include compiled and frozen form both. Issue#241. Fixed in 0.5.14.1 already.

  • Fix, compilation for conditional statements where the boolean check on the condition cannot raise, could fail compilation. Issue#240. Fixed in 0.5.14.2 already.

  • Fix, the __import__ built-in was making static optimization assuming compile time constants to be strings, which in the error case they are not, which was crashing the compiler. Issue#240.

    __import__(("some.module",)) # tuples don't work
    

    This error became only apparent, because now in some cases, Nuitka forward propagates values.

  • Windows: Fix, when installing Python2 only for the user, the detection of it via registry failed as it was only searching system key. This was a github pull request. Fixed in 0.5.14.3 already.

  • Some modules have extremely complex expressions requiring too deep recursion to work on all platforms. These modules are now included entirely as bytecode fallback. Issue#240.

  • The standard library may contain broken code due to installation mistakes. We have to ignore their SyntaxError. Issue#244.

  • Fix, pickling compiled methods was failing with the wrong kind of error, because they should not implement __reduce__, but only __deepcopy__. Issue#219.

  • Fix, when running under wine, the check for scons binary was fooled by existance of /usr/bin/scons. Issue#251.

New Features

  • Added experimental support for Python3.5, coroutines don't work yet, but it works perfectly as a 3.4 replacement.
  • Added experimental Nuitka plug-in framework, and use it for the packaging of Qt plugins in standalone mode. The API is not yet stable nor polished.
  • New option --debugger that makes --run execute directly in gdb and gives a stack trace on crash.
  • New option --profile executes compiled binary and outputs measured performance with vmprof. This is work in progress and not functional yet.
  • Started work on --graph to render the SSA state into diagrams. This is work in progress and not functional yet.
  • Plug-in framework added. Not yet ready for users. Working PyQt4 and PyQt5 plug-in support. Experimental Windows multiprocessing support. Experimental PyLint warnings disable support. More to come.
  • Added support for AnaConda accelerated mode on MacOS by modifying the rpath to the Python DLL.
  • Added experimental support for multiprocessing on Windows, which needs money patching of the module to support compiled methods.

Optimization

  • The SSA analysis is now enabled by default, eliminating variables that are not shared, and can be forward propagated. This is currently limited mostly to compile time constants, but things won't remain that way.
  • Code generation for many constructs now takes into account if a specific operation can raise or not. If e.g. an attribute look-up is known to not raise, then that is now decided by the node the looked is done to, and then more often can determine this, or even directly the value.
  • Calls to C-API that we know cannot raise, no longer check, but merely assert the result.
  • For attribute look-up and other operations that might be known to not raise, we now only assert that it succeeds.
  • Built-in loop-ups cannot fail, merely assert that.
  • Creation of built-in exceptions never raises, merely assert that too.
  • More Python operation slots now have their own computations and some of these gained overloads for more compile time constant optimization.
  • When taking an iterator cannot raise, this is now detected more often.
  • The try/finally construct is now represented by duplicating the final block into all kinds of handlers (break, continue, return, or except) and optimized separately. This allows for SSA to trace values more correctly.
  • The hash built-in now has dedicated node and code generation too. This is mostly intended to represent the side effects of dictionary look-up, but gives more compact and faster code too.
  • Type type built-in cannot raise and has no side effect.
  • Speed improvement for in-place float operations for += and *=, as these will be common cases.

Tests

  • Made the construct based testing executable with Python3.
  • Removed warnings using the new PyLint warnings plug-in for the reflected test. Nuitka now uses the PyLint annotations to not warn. Also do not go into PyQt for reflected test, not needed. Many Python3 improvements for cases where there are differences to report.
  • The optimization tests no longer use 2to3 anymore, made the tests portable to all versions.
  • Checked more in-place operations for speed.

Organizational

  • Many improvements to the coverage taking. We can hope to see public data from this, some improvements were triggered from this already, but full runs of the test suite with coverage data collection are yet to be done.

Summary

The release includes many important new directions. Coverage analysis will be important to remain certain of test coverage of Nuitka itself. This is mostly done, but needs more work to complete.

Then the graphing surely will help us to debug and understand code examples. So instead of tracing, and reading stuff, we should visualize things, to more clearly see, how things evolve under optimization iteration, and where exactly one thing goes wrong. This will be improved as it proves necessary to do just that. So far, this has been rare. Expect this to become end user capable with time. If only to allow you to understand why Nuitka won't optimize code of yours, and what change of Nuitka it will need to improve.

The comparative performance benchmarking is clearly the most important thing to have for users. It deserves to be the top priority. Thanks to the PyPy tool vmprof, we may already be there on the data taking side, but the presenting and correlation part, is still open and a fair bit of work. It will be most important to empower users to make competent performance bug reports, now that Nuitka enters the phase, where these things matter.

As this is a lot of ground to cover. More than ever. We can make this compiler, but only if you help, it will arrive in your life time.

Nuitka Progress in Summer 2015

A long time has passed again without me speaking about what's going on with Nuitka, and that althought definitely a lot has happened. I would contend it's even because so much is going on.

I also am shy to make public postings about unfinished stuff it seems, but it's long overdue, so much important and great stuff has happened. We are in the middle of big things with the compiler and there is a lot of great achievement.

SSA (Single State Assignment Form)

For a long, long time already, each release of Nuitka has worked towards increasing "SSA" usage in Nuitka.

Now it's there. The current pre-release just uses it. There were many things to consider before enabling it, and always a next thing to be found that was needed. Often good changes to Nuitka, it was also annoying the hell out of me at times.

But basically now the forward propagation of variables is in place, with some limitations that are going to fall later.

So the current release, soon to be replaced, still doesn't optimize this code as well as possible:

def f():
    a = 1
    return a

But starting with the next release, the value of a is forward propagated (also in way more complex situations), and that's a serious milestone for the project.

Function Inlining

When submitting my talk to EuroPython 2015, I was putting a lot of pressure on me by promising to demo that. And I did. It was based on the SSA code that only now became completely reliable, but otherwise very few few other changes, and it just worked.

The example I used is this:

def f():
    def g(x, y):
        return x, y

    x = 2
    y = 1

    x, y = g(x, y) # can be inlined

    return x, y

So, the function g is forward propagated to a direct call, as are x and y into the return statement after making the in-line, making this:

def f():
    return 2, 1

Currently function in-lining is not yet activated by default, for this I am waiting for a release cycle to carry the load of SSA in the wild. As you probably know I usually tend to be conservative and to not make too many changes at once.

And as this works for local functions only yet, it's not too important yet either. This will generally become relevant once we have this working across modules and their globally defined functions or methods. This will be a while until Nuitka gets there.

Scalability

Having got Nuitka's memory usage under control, it turned out that there are files that can trigger Python recursion RuntimeError exception when using the ast module to build the Nuitka internal tree. People really have code with many thousands of operations to a + operation.

So, Nuitka here learned to include whole modules as bytecode when it is too complex as there is no easy way to expand the stack on Windows at least. That is kind of a limitation of CPython itself I didn't run into so far, and rather very annoying too.

The scalability of Nuitka also depends much on generated code size. With the optimization become more clever, less code is generated, and that trend will continue as more structural optimization are applied.

Compatibility

Very few things are possible here anymore. For the tests, in full compatibility mode, even more often the less good line number is used.

Also the plug-in work is leading to improved compatibility with Qt plugins of PySide and PyQt. Or another example is the multiprocessing module that on Windows is now supposed to fork compiled code too.

Python 3.5

The next release has experimental support for Python 3.5, with the notable exception that async and await, these do not yet work. It passes the existing test suite for CPython3.4 successfully. Passing here means, to pass or fail in the same way as does the uncompiled Python. Failures are of course expected, as details change, and a nice way of having coverage for exception codes.

The new @ operator is now supported. As the stable release of Python3.5 was made recently, there is now some pressure on having full support of course.

I am not sure, if you can fully appreciate the catch up game to play here. It will take a compiled coroutine to support these things properly. And that poses lots of puzzles to solve. As usual I am binding these to internal cleanups so it becomes simpler.

In the case of Python3.5, the single function body node type that is used for generators, class bodies, and function, is bound to be replaced with a base class and detailing instances, instead of one thing for them all, then with coroutines added.

Importing Enhancements

A while ago, the import logic was basically re-written with compatibility much increased. Then quite some issues were fixed. I am not sure, but some of the fixes have apparently also been regressions at times, with the need for other fixes now.

So it may have worked for you in the past, but you might have to report new found issues.

It's mainly the standalone community that encounters these issues, when just one of these imports doesn't find the correct thing, but picking the wrong one will of course have seriously bad impacts on compile time analysis too. So once we do cross module optimization, this must be rock solid.

I think we have gotten a long way there, but we still need to tackle some more fine details.

Performance

Graphs and Benchmarks

I also presented this weak point to EuroPython 2015 and my plan on how to resolve it. And low and behold, turns out the PyPy people had already developed a tool that will be usable for the task and to present to the conference.

So basically I was capable of doing kind of a prototype of comparative benchmark during EuroPython 2015 already. I will need to complete this. My plan was to get code names of functions sorted out in a better way, to more easily match the Nuitka C function names with Python functions in an automatic way. That matching is the hard part.

So that is already progressing, but I could need help with that definitely.

Nuitka really has to catch up with benchmarks generally.. The work on automated performance graphs has made more progress, and they are supposed to show up on Nuitka Speedcenter each time, master, develop, or factory git branches change.

Note

There currently is no structure to these graphs. There is no explanations or comments, and there is no trend indicators. All of which makes it basically useless to everybody except me. And even harder for me than necessary.

As a glimpse of what is possible with in-lined functions, look at this:

Lambda call construct case

But we also need to put real programs and use cases to test. This may need your help. Let me know if you want to. It takes work on taking the data, and merging them into one view, linking it with the source code ideally. That will be the tool you can just use on your own code.

Standalone

The standalone mode of Nuitka was pretty good, and continued to improve further, now largely with the help of plug-ins.

I now know that PyGTK is an issue and will need a plug-in to work. Once the plug-in interface is public, I hope for more outside contributions here.

Other Stuff

Funding

Nuitka receives the occasional donation and those make me very happy. As there is no support from organization like the PSF, I am all on my own there.

This year I traveled to Europython 2015, I needed a new desktop computer after burning the old one through with CI tests, the website has running costs, and so on. That is pretty hefty money. It would be sweet if aside of my free time it wouldn't also cost me money.

EuroPython 2015

This was a blast. Meeting people who knew Nuitka but not me was a regular occurrence. And many people well appreciate my work. It felt much different than the years before.

I was able to present Nuitka's function in-lining indeed there, and this high goal that I set myself, quite impressed people. My talk went very well, I am going to post a link separately in another post.

Also I made many new contacts, largely with the scientific community. I hope to find work with data scientists in the coming years. More amd more it looks like my day job should be closer to Nuitka and my expertise in Python.

Collaborators

Nuitka is making break through progress. And you can be a part of it. Now.

You can join and should do so now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Future

So, there is multiple things going on:

  • Function in-lining

    For locally declared functions, it should become possible to avoid their creation, and make direct calls instead of ones that use function objects and expensive parameter handling.

  • Nested frames

    One result of in-lining will be nested frames still present for exceptions to be properly annotated, or locals giving different sets of locals and so on.

    Some cleanup of these will be needed for code generation and SSA to be able to attach variables to some sort of container, and for a function to be able to reference different sets of these.

  • Type Inference

    With SSA in place, we really can start to recognize types, and treat things that work something assigned from [] different, and with code special to these.

    That's going to be a lot of work. For float and list there are very important use cases, where the code can be much better.

  • Shape Analyisis

    My plan for types, is not to use them, but the more general shapes, things that will be more prevalent than actual type information in a program. In fact the precise knowledge will be rare, but more often, we will just have a set of operations performed on a variable, and be able to guess from there.

  • Python 3.5 new features

    The coroutines are a new type, and currently it's unclear how deep this is tied into the core of things, i.e. if a compile coroutine can be a premier citizen immediately, or if that needs more work. I hope it just takes for the code object to have the proper flag. But there could be stupid type checks, we shall see.

  • Plug-ins

    Something I wish I could have shown at EuroPython was plug-ins to Nuitka. It is recently becoming more complete, and some demo plug-ins for say Qt plugins, or multiprocessing, are starting to work. The API will need work and of course documentation. Hope is for this to expand Nuitka's reach and appeal to get more contributors.

Let me know, if you are willing to help. I really need that help to make things happen faster. Nuitka will become more and more important only.

Nuitka Release 0.5.14

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is an intermediate step towards value propagation, which is not considered ready for stable release yet. The major point is the elimination of the try/finally expressions, as they are problems to SSA. The try/finally statement change is delayed.

There are also a lot of bug fixes, and enhancements to code generation, as well as major cleanups of code base.

Bug Fixes

  • Python3: Added support assignments trailing star assignment.

    *a, b = 1, 2
    

    This raised ValueError before.

  • Python3: Properly detect illegal double star assignments.

    *a, *b = c
    
  • Python3: Properly detect the syntax error to star assign from non-tuple/list.

    *a = 1
    
  • Python3.4: Fixed a crash of the binary when copying dictionaries with split tables received as star arguments.

  • Python3: Fixed reference loss, when using raise a from b where b was an exception instance. Fixed in 0.5.13.8 already.

  • Windows: Fix, the flag --disable-windows-console was not properly handled for MinGW32 run time resulting in a crash.

  • Python2.7.10: Was not recognizing this as a 2.7.x variant and therefore not applying minor version compatibility levels properly.

  • Fix, when choosing to have frozen source references, code objects were not use the same value as __file__ did for its filename.

  • Fix, when re-executing itself to drop the site module, make sure we find the same file again, and not according to the PYTHONPATH changes coming from it. Issue#223. Fixed in 0.5.13.4 already.

  • Enhanced code generation for del variable statements, where it's clear that the value must be assigned.

  • When pressing CTRL-C, the stack traces from both Nuitka and Scons were given, we now avoid the one from Scons.

  • Fix, the dump from --xml no longer contains functions that have become unused during analysis.

  • Standalone: Creating or running programs from inside unicode paths was not working on Windows. Issue#231 Issue#229 and. Fixed in 0.5.13.7 already.

  • Namespace package support was not yet complete, importing the parent of a package was still failing. Issue#230. Fixed in 0.5.13.7 already.

  • Python2.6: Compatibility for exception check messages enhanced with newest minor releases.

  • Compatibility: The NameError in classes needs to say global name and not just name too.

  • Python3: Fixed creation of XML representation, now done without lxml as it doesn't support needed features on that version. Fixed in 0.5.13.5 already.

  • Python2: Fix, when creating code for the largest negative constant to still fit into int, that was only working in the main module. Issue#228. Fixed in 0.5.13.5 already.

  • Compatibility: The print statement raised an assertion on unicode objects that could not be encoded with ascii codec.

New Features

  • Added support for Windows 10.
  • Followed changes for Python 3.5 beta 2. Still only usable as a Python 3.4 replacement, no new features.
  • Using a self compiled Python running from the source tree is now supported.
  • Added support for AnaConda Python distribution. As it doesn't install the Python DLL, we copy it along for acceleration mode.
  • Added support for Visual Studio 2015. Issue#222. Fixed in 0.5.13.3 already.
  • Added support for self compiled Python versions running from build tree, this is intended to help debug things on Windows.

Optimization

  • Function in-lining is now present in the code, but still disabled, because it needs more changes in other areas, before we can generally do it.

  • Trivial outlines, result of re-formulations or function in-lining, are now in-lined, in case they just return an expression.

  • The re-formulation for or and and has been giving up, eliminating the use of a try/finally expression, at the cost of dedicated boolean nodes and code generation for these.

    This saves around 8% of compile time memory for Nuitka, and allows for faster and more complete optimization, and gets rid of a complicated structure for analysis.

  • When a frame is used in an exception, its locals are detached. This was done more often than necessary and even for frames that are not necessary our own ones. This will speed up some exception cases.

  • When the default arguments, or the keyword default arguments (Python3) or the annotations (Python3) were raising an exception, the function definition is now replaced with the exception, saving a code generation. This happens frequently with Python2/Python3 compatible code guarded by version checks.

  • The SSA analysis for loops now properly traces "break" statement situations and merges the post-loop situation from all of them. This significantly allows for and improves optimization of code following the loop.

  • The SSA analysis of try/finally statements has been greatly enhanced. The handler for finally is now optimized for exception raise and no exception raise individually, as well as for break, continue and return in the tried code. The SSA analysis for after the statement is now the result of merging these different cases, should they not abort.

  • The code generation for del statements is now taking advantage should there be definite knowledge of previous value. This speed them up slightly.

  • The SSA analysis of del statements now properly decided if the statement can raise or not, allowing for more optimization.

  • For list contractions, the re-formulation was enhanced using the new outline construct instead of a pseudo function, leading to better analysis and code generation.

  • Comparison chains are now re-formulated into outlines too, allowing for better analysis of them.

  • Exceptions raised in function creations, e.g. in default values, are now propagated, eliminating the function's code. This happens most often with Python2/Python3 in branches. On the other hand, function creations that cannot are also annotated now.

  • Closure variables that become unreferenced outside of the function become normal variables leading to better tracing and code generation for them.

  • Function creations cannot raise except their defaults, keyword defaults or annotations do.

  • Built-in references can now be converted to strings at compile time, e.g. when printed.

Organizational

  • Removed gitorious mirror of the git repository, they shut down.
  • Make it more clear in the documentation that Python2 is needed at compile time to create Python3 executables.

Cleanups

  • Moved more parts of code generation to their own modules, and used registry for code generation for more expression kinds.

  • Unified try/except and try/finally into a single construct that handles both through try/except/break/continue/return semantics. Finally is now solved via duplicating the handler into cases necessary.

    No longer are nodes annotated with information if they need to publish the exception or not, this is now all done with the dedicated nodes.

  • The try/finally expressions have been replaced with outline function bodies, that instead of side effect statements, are more like functions with return values, allowing for easier analysis and dedicated code generation of much lower complexity.

  • No more "tolerant" flag for release nodes, we now decide this fully based on SSA information.

  • Added helper for assertions that code flow does not reach certain positions, e.g. a function must return or raise, aborting statements do not continue and so on.

  • To keep cloning of code parts as simple as possible, the limited use of makeCloneAt has been changed to a new makeClone which produces identical copies, which is what we always do. And a generic cloning based on "details" has been added, requiring to make constructor arguments and details complete and consistent.

  • The re-formulation code helpers have been improved to be more convenient at creating nodes.

  • The old nuitka.codegen module Generator was still used for many things. These now all got moved to appropriate code generation modules, and their users got updated, also moving some code generator functions in the process.

  • The module nuitka.codegen.CodeTemplates got replaces with direct uses of the proper topic module from nuitka.codegen.templates, with some more added, and their names harmonized to be more easily recognizable.

  • Added more assertions to the generated code, to aid bug finding.

  • The autoformat now sorts pylint markups for increased consistency.

  • Releases no longer have a tolerant flag, this was not needed anymore as we use SSA.

  • Handle CTRL-C in scons code preventing per job messages that are not helpful and avoid tracebacks from scons, also remove more unused tools like rpm from out in-line copy.

Tests

  • Added the CPython3.4 test suite.

  • The CPython3.2, CPython3.3, and CPython3.4 test suite now run with Python2 giving the same errors. Previously there were a few specific errors, some with line numbers, some with different SyntaxError be raised, due to different order of checks.

    This increases the coverage of the exception raising tests somewhat.

  • Also the CPython3.x test suites now all pass with debug Python, as does the CPython 2.6 test suite with 2.6 now.

  • Added tests to cover all forms of unpacking assignments supported in Python3, to be sure there are no other errors unknown to us.

  • Started to document the reference count tests, and to make it more robust against SSA optimization. This will take some time and is work in progress.

  • Made the compile library test robust against modules that raise a syntax error, checking that Nuitka does the same.

  • Refined more tests to be directly execuable with Python3, this is an ongoing effort.

Summary

This release is clearly major. It represents a huge step forward for Nuitka as it improves nearly every aspect of code generation and analysis. Removing the try/finally expression nodes proved to be necessary in order to even have the correct SSA in their cases. Very important optimization was blocked by it.

Going forward, the try/finally statements will be removed and dead variable elimination will happen, which then will give function inlining. This is expected to happen in one of the next releases.

This release is a consolidation of 8 hotfix releases, and many refactorings needed towards the next big step, which might also break things, and for that reason is going to get its own release cycle.

Nuitka Release 0.5.13

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains the first use of SSA for value propagation and massive amounts of bug fixes and optimization. Some of the bugs that were delivered as hotfixes, were only revealed when doing the value propagation as they still could apply to real code.

Bug Fixes

  • Fix, relative imports in packages were not working with absolute imports enabled via future flags. Fixed in 0.5.12.1 already.
  • Loops were not properly degrading knowledge from inside the loop at loop exit, and therefore this could have lead missing checks and releases in code generation for cases, for del statements in the loop body. Fixed in 0.5.12.1 already.
  • The or and and re-formulation could trigger false assertions, due to early releases for compatibility. Fixed in 0.5.12.1 already.
  • Fix, optimizion of calls of constant objects (always an exception), crashed the compiler. This corrects Issue#202. Fixed in 0.5.12.2 already.
  • Standalone: Added support for site.py installations with a leading def or class statement, which is defeating our attempt to patch __file__ for it. This corrects Issue#189.
  • Compatibility: In full compatibility mode, the tracebacks of or and and expressions are now as wrong as they are in CPython. Does not apply to --improved mode.
  • Standalone: Added missing dependency on QtGui by QtWidgets for PyQt5.
  • MacOS: Improved parsing of otool output to avoid duplicate entries, which can also be entirely wrong in the case of Qt plugins at least.
  • Avoid relative paths for main program with file reference mode original, as it otherwise changes as the file moves.
  • MinGW: The created modules depended on MinGW to be in PATH for their usage. This is no longer necessary, as we now link these libraries statically for modules too.
  • Windows: For modules, the option --run to immediately load the modules had been broken for a while.
  • Standalone: Ignore Windows DLLs that were attempted to be loaded, but then failed to load. This happens e.g. when both PySide and PyQt are installed, and could cause the dreaded conflicting DLLs message. The DLL loaded in error is now ignored, which avoids this.
  • MinGW: The resource file used might be empty, in which case it doesn't get created, avoiding an error due to that.
  • MinGW: Modules can now be created again. The run time relative code uses an API that is WinXP only, and MinGW failed to find it without guidance.

Optimization

  • Make direct calls out of called function creations. Initially this applies to lambda functions only, but it's expected to become common place in coming releases. This is now 20x faster than CPython.

    # Nuitka avoids creating a function object, parsing function arguments:
    (lambda x:x)(something)
    
  • Propagate assignments from non-mutable constants forward based on SSA information. This is the first step of using SSA for real compile time optimization.

  • Specialized the creation of call nodes at creation, avoiding to have all kinds be the most flexible form (keyword and plain arguments), but instead only what kind of call they really are. This saves lots of memory, and makes the tree faster to visit.

  • Added support for optimizing the slice built-in with compile time constant arguments to constants. The re-formulation for slices in Python3 uses these a lot. And the lack of this optimization prevented a bunch of optimization in this area. For Python2 the built-in is optimized too, but not as important probably.

  • Added support for optimizing isinstance calls with compile time constant arguments. This avoids static exception raises in the exec re-formulation which tests for file type, and then optimization couldn't tell that a str is not a file instance. Now it can.

  • Lower in-place operations on immutable types to normal operations. This will allow to compile time compute these more accurately.

  • The re-formulation of loops puts the loop condition as a conditional statement with break. The not that needs to apply was only added in later optimization, leading to unnecessary compile time efforts.

  • Removed per variable trace visit from optimization, removing useless code and compile time overhead. We are going to optimize things by making decision in assignment and reference nodes based on forward looking statements using the last trace collection.

New Features

  • Added experimental support for Python 3.5, which seems to be passing the test suites just fine. The new @ matrix multiplicator operators are not yet supported though.
  • Added support for patching source on the fly. This is used to work around a (now fixed) issue with numexpr.cpuinfo making type checks with the is operation, about the only thing we cannot detect.

Organizational

  • Added repository for Ubuntu Vivid (15.04) for download. Removed Ubuntu Saucy and Ubuntu Raring package downloads, these are no longer supported by Ubuntu.
  • Added repository for Debian Stretch, after Jessie release.
  • Make it more clear in the documentation that in order to compile Python3, a Python2 is needed to execute Scons, but that the end result is a Python3 binary.
  • The PyLint checker tool now can operate on directories given on the command line, and whitelists an error that is Windows only.

Cleanups

  • Split up standalone code further, moving depends.exe handling to a separate module.
  • Reduced code complexity of scons interface.
  • Cleaned up where trace collection is being done. It was partially still done inside the collection itself instead in the owner.
  • In case of conflicting DLLs for standalone mode, these are now output with nicer formatting, that makes it easy to recognize what is going on.
  • Moved code to fetch depends.exe to dedicated module, so it's not as much in the way of standalone code.

Tests

  • Made BuiltinsTest directly executable with Python3.
  • Added construct test to demonstrate the speed up of direct lambda calls.
  • The deletion of @test for the CPython test suite is more robust now, esp. on Windows, the symbolic links are now handled.
  • Added test to cover or usage with in-place assignment.
  • Cover local relative import from . with absolute_import future flag enabled.
  • Again, more basic tests are now directly executable with Python3.

Summary

This release is major due to amount of ground covered. The reduction in memory usage of Nuitka itself (the C++ compiler will still use much memory) is very massive and an important aspect of scalability too.

Then the SSA changes are truly the first sign of major improvements to come. In their current form, without eliminating dead assignments, the full advantage is not taken yet, but the next releases will do this, and that's a major milestone to Nuitka.

The other optimization mostly stem from looking at things closer, and trying to work towards function in-lining, for which we are making a lot of progress now.

Nuitka Progress in Spring 2015

It's absolutely time to speak about what's going on with Nuitka, there have been a few releases, and big things are going to happen now. The ones I have always talked of, it's happening now.

I absolutely prefer to talk of things when they are completed, that is why I am shy to make these kinds of postings, but this time, I think it's warranted. The next couple of releases are going to be very different.

SSA (Single State Assignment Form)

For a long, long time already, each release of Nuitka has worked towards increasing "SSA" usage in Nuitka.

The component that works on this, is now called "trace collection", and does the major driving part for optimization. It collects "variable traces" and puts them together into "global" forms as well.

Based on these traces, optimizations can be made. Having SSA or not, is (to me) the difference between Nuitka as a mere compiler, and Nuitka as an optimizing compiler.

The major news is that factory versions of Nuitka now do this in serious ways, propagating values forward, and we also are close to eliminating dead assignments, some of which become dead by being having been forward propagated.

So we can now finally see that big step, jump really, happening, and Nuitka does now do some pretty good static optimization, at least locally.

Still, right now, this trival code assigns to a local variable, then reads from it to return. But not for much longer.

def f():
    a = 1
    return a

This is going to instantly give performance gains, and more importantly, will enable analysis, that leads to avoiding e.g. the creation of function objects for local functions, becoming able to in-line, etc.

This is major excitement to me. And I cannot wait to have the releases that do this.

Scalability

The focus has also been lately, to reduce Nuitka's own memory usage. It has gone down by a large factor, often by avoiding cyclic dependencies in the data structures, that the garbage collector of Python failed to deal with properly.

The scalability of Nuitka also depends much on generated code size. With the optimization become more clever, less code needs to be generated, and that will help a lot. On some platforms, MSVC most notably, it can be really slow, but it's noteworthy that Nuitka works not just with 2008 edition, but with the lastest MSVC, which appears to be better.

Compatibility

There was not a whole lot to gain in the compatibility domain anymore. Nothing important certainly. But there are import changes.

Python 3.5

The next release has changes to compile and run the Python3.4 test suite successfully. Passing here means, to pass/fail in the same way as does the uncompiled Python. Failures are of course expected, and a nice way of having coverage for exception codes.

The new @ operator is not supported yet. I will wait with that for things to stabilize. It's currently only an alpha release.

However, Nuitka has probably never been this close to supporting a new Python version at release time. And since 3.4 was such a heavy drain, and still not perfectly handled (super still works like it's 3.3 e.g.), I wanted to know what is coming a bit sooner.

Cells for Closure

We now provide a __closure__ value for compiled functions too. These are not writable in Python, so it's only a view. Having moved storage into the compiled function object, that was easy.

Importing Enhancements

The the past couple of releases, the import logic was basically re-written with compatibility much increased. The handling of file case during import, multiple occurrences in the path, and absolute import future flags for relative imports has been added.

It's mainly the standalone community that will have issues, when just one of these imports doesn't find the correct thing, but picking the wrong one will of course have seriously bad impacts on compile time analysis too. So once we do cross module optimization, this must be rock solid.

I think we have gotten there, tackling these finer details now too.

Performance

Graphs and Benchmarks

Nuitka, users don't know what to expect regarding the speed of their code after compilation through Nuitka, neither now nor after type inference (possibly hard to guess). Nuitka does a bunch of optimizations for some constructs pretty heavily, but weak at others. But how much does that affect real code?

There may well be no significant gain at all for many people, while there is a number for PyStone that suggests higher. The current and future versions possibly do speed up but the point is that you cannot tell if it is even worth for someone to try.

Nuitka really has to catch up here. The work on automated performance graphs has some made progress, and they are supposed to show up on Nuitka Speedcenter each time, master, develop or factory git branches change.

Note

There currently is no structure to these graphs. There is no explanations or comments, and there is no trend indicators. All of which makes it basically useless to everybody except me. And even harder for me than necessary.

However, as a glimpse of what will happen when we in-line functions, take a look at the case, where we already eliminate parameter parsing only, and make tremendous speedups:

Lambda call construct case

Right now (the graph gets automatic updates with each change), what you should see, is that develop branch is 20 times faster than CPython for that very specific bit of code. That is where we want to be, except that with actually in-line, this will of course be even better.

It's artifical, but once we can forward propagate local function creations, it will apply there too. The puzzle completes.

But we also need to put real programs and use cases to test. This may need your help. Let me know if you want to.

Standalone

The standalone mode of Nuitka is pretty good, and as usual it continued to improve only.

Nothing all that important going on there, except the work on a plug-in framework, which is under development, and being used to handle e.g. PyQt plug-ins, or known issues with certain packages.

The importing improvements already mentioned, have now allowed to cover many more libraries successfully than before.

Other Stuff

Debian Stable

Nuitka is now part of Debian stable, aka Jessie. Debian and Python are the two things closest to my heart in the tech field. You can imagine that being an upstream worthy of inclusion into Debian stable is an import milestone to Nuitka for me.

Funding

Nuitka receives the occasional donation and those make me very happy. As there is no support from organization like the PSF, I am all on my own there.

This year I likely will travel to Europython 2015, and would ask you to support me with that, it's going to be expensive.

EuroPython 2015

I have plans to present Nuitka's function in-lining there, real stuff, on a fully and functional compiler that works as a drop-in replacement.

Not 100% sure if I can make it by the time, but things look good. Actually so far I felt ahead of the plan, but as you know, this can easily change at any point. But Nuitka stands on very stable grounds code wise.

Collaborators

Things are coming along nicely. When I started out, I was fully aware that the project is something that I can do on my own if necessary, and that has not changed. Things are going slower than necessary though, but that's probably very typical.

But you can join and should do so now, just follow this link or become part of the mailing list and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably.

Nuitka is about to make break through progress. And you can be a part of it. Now.

Future

So, there is multiple things going on:

  • More SSA usage

    The next releases are going to be all about getting this done.

    Once we take it to that next level, Nuitka will be able to speed up some things by much more than the factor it basically has provided for 2 years now, and it's probably going to happen long before EuroPython 2015.

  • Function in-lining

    For locally declared functions, it should become possible to avoid their creation, and make direct calls instead of ones that use function objects and expensive parameter handling.

    The next step there of course is to not only bind the arguments to the function signature, but then also to in-line and potentially specialize the function code. It's my goal to have that at EuroPython 2015 in a form ready to show off.

When these 2 things come to term, Nuitka will have made really huge steps ahead and layed the ground for success.

From then on, a boatload of work remains. The infrastructure in place, still there is going to be plenty of work to optimize more and more things conretely, and to e.g. do type inference, and generate different codes for booleans, ints or float values.

Let me know, if you are willing to help. I really need that help to make things happen faster. Nuitka will become more and more important only.