Nuitka this week #7

Nuitka Design Philosophy

Note

I wrote this as part of a discussion recently, and I think it makes sense to share my take on Nuitka and design. This is a lot text though, feel free to skip forward.

The issue with Nuitka and design mainly for me is that the requirements for many parts were and are largely unknown to me, until I actually start to do it.

My goto generators approach worked out as originally designed, and that felt really cool for once, but the whole "C type" thing was a total unknown to me, until it all magically took form.

But rather I know it will evolve further if I go from "bool" (complete and coming for 0.6.0) via "void" (should be complete already, but enabling will happen only for 0.6.1 likely) to "int", not sure how long that will take.

I really think Nuitka, unlike other software that I have designed, is more of a prototype project that gradually turns more and more into the real thing.

I have literally spent years to inject proper design in steps into the optimization phase, what I call SSA, value tracing, and it is very much there now. I am probably going to spend similar amounts of time, to execute on applying type inference results to the code generation.

So I turned that into something working with code strings to something working with variable declaration objects knowing their type for the goto generators, aiming at C types generally. All the while carrying the full weight of passing every compatibility test there is.

Then e.g. suddenly cleaning up module variables to no longer have their special branch, but a pseudo C type, that makes them like everything else. Great. But when I first introduced the new thing, I postponed that, because I could sooner apply its benefits to some things and get experience from it.

While doing partial solutions, the design sometimes horribly degrades, but only until some features can carry the full weight, and/or have been explored to have their final form.

Making a whole Nuitka design upfront and then executing it, would instead give a very high probability of failing in the real world. I am therefore applying the more agile approach, where I make things work first. And then continue to work while I clean it up.

For every feature I added, I actively go out, and change the thing, that made it hard or even fail. Always. I think Nuitka is largely developed by cleanups and refactoring. Goto generators were a fine example of that, solving many of the issues by injecting variable declarations objects into code generation, made it easy to indicate storage (heap or object or stack) right there.

That is not to say that Nuitka didn't have the typical compiler design. Like parsing inputs, optimizing a tree internally, producing outputs. But that grand top level design only tells you the obvious things really and is stolen anyway from knowing similar projects like gcc.

There always were of course obvious designs for Nuitka, but that really never was what anybody would consider to make a Python compiler hard. But for actual compatibility of CPython, so many details were going to require examination with no solutions known ahead of time.

I guess, I am an extreme programmer, or agile, or however they call it these days. At least for Nuitka. In my professional life, I have designed software for ATC on the drawing board, then in paper, and then in code, the design just worked, and got operational right after completion, which is rare I can tell you.

But maybe that is what keeps me exciting about Nuitka. How I need to go beyond my abilities and stable ground to achieve it.

But the complexity of Nuitka is so dramatically higher than anything I ever did. It is doing a complicated, i.e. detail rich work, and then it also is doing hard jobs where many things have to play together. And the wish to have something working before it is completed, if it ever is, makes things very different from projects I typically did.

So the first version of Nuitka already had a use, and when I publicly showed it first, was capable of handling most complex programs, and the desire was to evolve gradually.

I think I have desribed this elsewhere, but for large parts of the well or bad designed solutions of Nuitka, there is reliable ways of demonstrating it works correctly. Far better than I have ever encountered. i believe it's the main reason I managed to get this off the ground is that. Having a test "oracle" is what makes Nuitka special, i.e. comparing to existing implementations.

Like a calculator can be tested comparing it to one of the many already perfect ones out there. That again makes Nuitka relatively easy despite the many details to get right, there is often an easy way to tell correct from wrong.

So for me, Nuitka is on the design level, something that goes through many iterations, discovery, prototyping, and is actually really exciting in that.

Compilers typically are boring. But for Nuitka that is totally not the case, because Python is not made for it. Well, that*s technically untrue, lets say not for optimizing compilers, not for type inference, etc.

UI rework

Following up on discussion on the mailing list, the user interface of Nuitka will become more clear with --include-* options and --[no]follow-import* options that better express what is going to happen.

Also the default for following with extension modules is now precisely what you say, as going beyond what you intend to deliver makes no sense in the normal case.

Goto Generators

Now release as 0.5.33 and there has been little regressions so far, but the one found is only in the pre-release of 0.6.0 so use that instead if you encounter a C compilation error.

Benchmarks

The performance regressions fixed for 0.6.0 impact pystone by a lot, loops were slower, so were subscripts with constant integer indexes. It is a pity these were introduced in previous releases during refactorings without noticing.

We should strive to have benchmarks with trends. Right now Nuitka speedcenter cannot do it. Focus shoud definitely go to this. Like I said, after 0.6.0 release, this will be a priority, to make them more useful.

Twitter

I continue to be active there. I just put out a poll about the comment system, and disabling Disqus comments I will focus on Twitter for web site comments too now.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Egg files in PYTHONPATH

This is a relatively old issue that now got addressed. Basically these should be loaded from for compilation. Nuitka now unpacks them to a cache folder so it can read source code from them, so this apparently rare use case works now, yet again improving compatibility.

Will be there for 0.6.0 release.

Certifi

Seems request module sometimes uses that. Nuitka now includes that data file starting with 0.6.0 release.

Compatibility with pkg_resources

It seems that getting "distributions" and taking versions from there, is really a thing, and Nuitka fails pkg_resources requirement checks in standalone mode at least, and that is of course sad.

I am currently researching how to fix that, not sure yet how to do it. But some forms of Python installs are apparently very affected by it. I try looking into its data gathering, maybe compiled modules can be registered there too. It seems to be based on file system scans of its own makings, but there is always a monkey patch possible to make it better.

Plans

Still working on the 0.6.0 release, cleaning up open ends only. Release tests seem to be pretty good looking. The UI changes and stuff are a good time to be done now, but delay things, and there is a bunch of small things that are low hanging fruits while I wait for test results.

But since it fixes so many performance things, it really ought to be out any day now.

Also the in-place operations stuff, I added it to 0.6.0 too, just because it feels very nice, and improves some operations by a lot too. Initially I had made a cut for 0.6.1 already, but that is no more.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka Release 0.5.33

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains a bunch of fixes, most of which were previously released as part of hotfixes, and important new optimization.

Bug Fixes

  • Fix, nested functions with local classes using outside function closure variables were not registering their usage, which could lead to errors at C compile time. Fixed in 0.5.32.1 already.
  • Fix, usage of built-in calls in a class level could crash the compiler if a class variable was updated with its result. Fixed in 0.5.32.1 already.
  • Python 3.7: The handling of non-type bases classes was not fully compatible and wrong usages were giving AttributeError instead of TypeError. Fixed in 0.5.32.2 already.
  • Python 3.5: Fix, await expressions didn't annotate their exception exit. Fixed in 0.5.32.2 already.
  • Python3: The enum module usages with __new__ in derived classes were not working, due to our automatic staticmethod decoration. Turns out, that was only needed for Python2 and can be removed, making enum work all the way. Fixed in 0.5.32.3 already.
  • Fix, recursion into __main__ was done and could lead to compiler crashes if the main module was named like that. This is not prevented. Fixed in 0.5.32.3 already.
  • Python3: The name for list contraction's frames was wrong all along and not just changed for 3.7, so drop that version check on it. Fixed in 0.5.32.3 already.
  • Fix, the hashing of code objects has creating a key that could produce more overlaps for the hash than necessary. Using a C1 on line 29 and C on line 129, was considered the same. And that is what actually happened. Fixed in 0.5.32.3 already.
  • MacOS: Various fixes for newer Xcode versions to work as well.
  • Python3: Fix, the default __annotations__ was the empty dict and could be modified, leading to severe corruption potentially. Fixed in 0.5.32.4 already.
  • Python3: When an exception is thrown into a generator that currently does a yield from is not to be normalized.
  • Python3: Some exception handling cases of yield from were leaking references to objects. Fixed in 0.5.32.5 already.
  • Python3: Nested namespace packages were not working unless the directory continued to exist on disk. Fixed in 0.5.32.5 already.
  • Standalone: Do not include icuuc.dll which is a system DLL. Fixed in 0.5.32.5 already.
  • Standalone: Added hidden dependency of newer version of sip. Fixed in 0.5.32.5 already.
  • Standalone: Do not copy file permissions of DLLs and extension modules as that makes deleting and modifying them only harder.
  • Python 3.5: Fixed exception handling with coroutines and asyncgen throw to not corrupt exception objects.
  • Python 3.7: Added more checks to class creations that were missing for full compatibility.

Organizational

  • The issue tracker on Github is now the one that should be used with Nuitka, winning due to easier issue templating and integration with pull requests.
  • Document the threading model and exception model to use for MinGW64.
  • Removed the enum plug-in which is no longer useful after the improvements to the staticmethod handling for Python3.
  • Added Python 3.7 testing for Travis.
  • Make it clear in the documentation that pyenv is not supported.

New Features

  • Added support for MiniConda Python.

Optimization

  • Using goto based generators that return from execution and resume based on heap storage. This makes tests using generators twice as fast and they no longer use a full C stack of 2MB, but only 1K instead.
  • Put all statement related code and declarations for it in a dedicated C block, making things slightly more easy for the C compiler to re-use the stack space.
  • Avoid linking against libpython in module mode on everything but Windows where it is really needed. No longer check for static Python, not needed anymore.
  • More compact function, generator, and asyncgen creation code for the normal cases, avoid qualname if identical to name for all of them.
  • Python2 class dictionaries are now indeed directly optimized, giving more compact code.

Cleanups

  • Frame object and their cache declarations are now handled by the way of allocated variable descriptions, avoid special handling for them.
  • The interface to "forget" a temporary variable has been replaced with a new method that skips a number for it. This is done to keep expression use the same indexes for all their child expressions, but this is more explicit.
  • Instead of passing around C variables names for temporary values, we now have full descriptions, with C type, code name, storage location, and the init value to use. This makes the information more immediately available where it is needed.
  • Variable declarations are now created when needed and stored in dedicated variable storage objects, which then in can generate the code as necessary.
  • Module code generation has been enhanced to be closer to the pattern used by functions, generators, etc.
  • There is now only one spot that creates variable declaration, instead of previous code duplications.
  • Code objects are now attached to functions, generators, coroutines, and asyncgen bodies, and not anymore to the creation of these objects. This allows for simpler code generation.
  • Removed fiber implementations, no more needed.

Tests

  • Finally the asyncgen tests can be enabled in the CPython 3.6 test suite as the corrupting crasher has been identified.
  • Cover ever more cases of spurious permission problems on Windows.

Summary

This release is huge in many ways.

First, finishing "goto generators" clears an old scalability problem of Nuitka that needed to be addressed. No more do generators/coroutines/asyncgen consume too much memory, but instead they become as lightweight as they ought to be.

Second, the use of variable declarations carying type information all through the code generation, is an import pre-condition for "C types" work to resume and become possible.

Third, the improved generator performance will be removing a lot of cases, where Nuitka wasn't as fast, as its current state not using "C types" yet, would allow.

Fourth, the fibers were a burden for the debugging and linking of Nuitka on various platforms, as they provided deprecated interfaces or not. As they are now gone, Nuitka ought to definitely work on any platform where Python works.

From here on, C types work will resume and hopefully yield good results soon.

Nuitka this week #6

Holiday

In my 2 weeks holiday, I indeed focused on a really big thing, and got more done that I had hoped for. For C types, nuitka_bool, which is a tri-state boolean with true, false and unassigned, can be used for some variables, and executes some operations without going through objects anymore.

bool

Condition codes are no longer special. They all need a boolean value from the expression used as a condition, and there was a special paths for some popular expressions for conditions, but of course not all. That is now a universal thing, conditional statement/expressions will now simply ask to provide a temp variable of value nuitka_bool and then code generation handles it.

For where it is used, code gets a lot lighter, and of course faster, although I didn't measure it yet. Going to Py_True/Py_False and comparing with it, wasn't that optimal, and it's nice this is now so much cleaner as a side effect of that C bool work.

This seems to be so good, that actually it's the default for this to be used in 0.6.0, and that itself is a major break through. Not so much for actual performance, but for structure. Other C types are going to follow soon and will give massive performance gains.

void

And what was really good, is that not only did I get bool to work almost perfectly, I also started work on the void C target type and finished that after my return from holiday last weekend, which lead to new optimization that I am putting in the 0.5.33 release that is coming soon, even before the void code generation is out.

The void C type cannot read values back, and unused values should not be used, so this gives errors for cases where that becomes obvious.

a or b

Consider this expression. The or expression, that one is going to producing a value, which is then released, but not used otherwise. New optimzation creates a conditional statement out of it, which takes a as the condition and if not true, then evaluates b but ignores it.

if not a:
   b

The void evaluation of b can then do further optimization for it.

Void code generation can therefore highlight missed opportunities for this kid of optimization, and found a couple of these. That is why I was going for it, and I feel it pays off. Code generation checking optimization here, is a really nice synergy between the two.

Plus I got all the tests to work with it, and solved the missing optimizations it found very easily. And instead of allocating an object now, not assigning is often creating more obvious code. And that too allowed me to find a couple of bugs by C compiler warnings.

Obviously I will want to run a compile all the world test before making it the default, which is why this will probably become part of 0.6.1 to be the default.

module_var

Previously variable codes were making a hard distinction for module variables and make them use their own helper codes. Now this is encapsulated in a normal C type class like nuitka_bool, or the one for PyObject * variables, and integrates smoothly, and even got better. A sign things are going smooth.

Goto Generators

Still not released. I delayed it after my holiday, and due to the heap generator change, after stabilizing the C types work, I want to first finish a tests/library/compile_python_module.py resume run, which will for a Anaconda3 compile all the code found in there.

Right now it's still doing that, and even found a few bugs. The heap storage can still cause issues, as can changes to cloning nodes, which happens for try nodes and their finally blocks.

This should finish these days. I looked at performance numbers and found that develop is indeed only faster, and factory due to even more optimization will be yet faster, and often noteworthy.

Benchmarks

The Speedcenter of Nuitka is what I use right now, but it's only showing the state of 3 branches and compared to CPython, not as much historical information. Also the organization of tests is poor. At least there is tags for what improved.

After release of Nuitka 0.6.0 I will show more numbers, and I will start to focus on making it easier to understand. Therefore no link right now, google if you are so keen. ;-)

Twitter

During the holiday sprint, and even after, I am going to Tweet a lot about what is going on for Nuitka. So follow me on twitter if you like, I will post important stuff as it happens there:

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Poll on Executable Names

So I put e.g. poll up on Twitter, which is now over. But it made me implement a new scheme, due to popular consensus

Hotfixes

Even more hotfixes. I even did 2 during my holiday, however packages built only later.

Threaded imports on 3.4 or higher of modules were not using the locking they should use. Multiprocessing on Windows with Python3 had even more problems, and the --include-package and --include-module were present, but not working.

That last one was actually very strange. I had added a new option group for them, but not added it to the parser. Result: Option works. Just does not show up in help output. Really?

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very like one you can help with.

Nuitka definitely needs more people to work on it.

Plans

Working down the release backlog. Things should be out. I am already working on what should become 0.6.1, but it's not yet 0.5.33 released. Not a big deal, but 0.6.0 has 2 really important fixes for performance regressions that have happened in the past. One is for loops, making that faster is probably like the most important one. The other for constant indexing, probably also very important. Very much measurable in pystone at least.

In the mean time, I am preparing to get int working as a target C type, so e.g. comparisons of such values could be done in pure C, or relatively pure C.

Also, I noticed that e.g. in-place operations can be way more optimized and did stuff for 0.6.1 already in this domain. That is unrelated to C type work, but kind of follows a similar route maybe. How to compare mixed types we know of, or one type only. That kind of things needs ideas and experiments.

Having int supported should help getting some functions to C speeds, or at least much closer to it. That will make noticable effects in many of the benchmarks. More C types will then follow one by one.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka this week #5

Goto Generators

Finished. Done. Finally.

Bechmarking was exciting. One program benchmark I had run in the past, was twice as fast than before, showing that the new implementation is indeed much faster, which is fantastic news.

Creating generator expressions and using them both got substantially faster and that is great.

It took me a fair amount of time to debug coroutines and asyncgen based on the new goto implementation. But the result is really good, and a fair amount of old bugs have been fixed. There always had been a segfault with asyncgen test that now has been eradicated.

One major observation is now, with only one C stack, debugging got a lot easier before, where context switches left much of the program state not reachable.

Benchmarks

Posted this one Twitter already:

Nuitka Speedcenter Builtin sum with generator

That one construct test has been a problem child, where Nuitka was slower than CPython 2.x, and very little faster than 3.x, and now with goto generators finally has become consistently faster.

I will explain what you see there in the next issue. The short version is that there is code, in which for one run, one line is used, and in another the other line is used, and then the "construct" is measure that way, by making the delta of the two. That construct performance is then compared between Python and Nuitka.

So if e.g. Nuitka is already better at looping, that won't influence the number of making that sum call with a generator expression.

The alternative line uses the generator expression, to make sure the construction time is not counted. To measure that, there is another construct test, that just creates it.

Nuitka Speedcenter Generator Expression Creation

This one shows that stable Nuitka was already faster at creating them, but that the develop version got even faster again. As creating generator objects became more lightweight, that is also news.

There are constructs for many parts of Python, to shed a light on how Nuitka fares for that particular one.

Holiday

In my 2 weeks holiday, I will try and focus on the next big thing, C types, something also started in the past, and where recent changes as part of the heap storage, should make it really a lot easier to get it finished. In fact I don't know right now, why my bool experimental work shouldn't just prove to be workable.

I am not going to post a TWN issue next week, mostly because my home servers won't be running, and the static site is rendered on one of them. Of course that would be movable, but I won't bother.

I am going to post a lot on Twitter though.

Static Compilation

There is a Github issue where I describe how pyenv on MacOS ought to be possible to use, and indeed, a brave soul has confirmed and even provided the concrete commands. All it takes now is somebody to fit this into the existing caching mechanism of Nuitka and to make sure the static library is properly patched to work with these commands.

Now is anyone of you going to create the code that will solve it for good?

Twitter

Follow me on twitter if you like, I will post important stuff as it happens there:

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Hotfixes

And there have been yet again more hotfixes. Some are about coroutine and asyncgen corruptions for closes of frames. Multiprocessing plugin on Windows will work in all cases now.

Noteworthy was that the "0.5.32.6" was having a git merge problem on the cherry-pick that git didn't tell me about, leading to crashes. That made it necessary to push an update right after. I was confused that I didn't get a conflict, because there was one. But I am to blame for not checking the actual diff.

Bug Tracker

The next release will make Github the official tracker for Nuitka issues. I am working down the issues on the old tracker. The web site already pointed users there for a while, and I was set on this for some time, but yesterday I focused on taking action.

Basically what won me over is the easier templating of issues and pull requests that would have been possible with Roundup, but never happened. Also the OpenID integration that bugs.python.org has, never became available to me in a ready usable form.

Issue Backlog

Finishing goto "generators allowed" for around 10 issues to be closed alone, and I went over things, and checked out some stale issues, to see if they are dealt with, or pinging authors. I spent like half a day on this, bring down the issue count by a lot. Tedious work, but must be done too.

Also my inbox got a fair amount of cleanup, lots of issues pile up there, and from time to time, I do this, to get things straight. I raised issues for 2 things, that I won't be doing immediately.

But actually as issues go, there really very little problematic stuff open right now, and nothing important really. I would almost call it issue clean.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very like one you can help with.

Nuitka definitely needs more people to work on it.

Plans

The goto generator work could be released, but I want to make the compile all the world test before I do so. It is running right now, but I will not complete before I leave. Also I do not want to get regression reports in my holiday, and goto generators along with heap storage, mean there could be some.

I am going to work on C types now. There is a few closing down actions on what I observed doing goto generators. There are a few easy ways to get even slightly better performance, definitely smaller code out of generators. Not sure if I go there first, or for the C types work directly. I often like to get these kind of observations dealt with more immediately, but I don't want to spend too much quality time on it.

Donations

As I have been asked this, yes, you can donate to Nuitka if you wish to further its development. Go here:

Donate to Nuitka

Nuitka this week #4

Goto Generators

This continues TWN #3 where I explained what is all about.

Good news is, at the time Python2 generators were largely working with the new ways, in the mean time not only did all of the Python 2.7 test suite pass with goto generators, also did the Python 3.4 test suite, i.e. also the yield from is working with it.

The way it was done is to set m_yieldfrom in generators, and then to enter a state, where the code will only be resumed, when that sub-generator that currently it is yielding from, is finished. That makes it very much like normal yield. In fact, code generation is hardly different there.

Since the whole purpose is to get rid of make/get/setcontext, the next stop is coroutines. They have async for, async with and await but at the end of the day, the implementation comes down to yield from really with only a lot of sugar applied.

Right now, I am debugging "goto coroutines". It's hard to tell when it will be finished, and then asyncgen will be waiting still.

This is easily the largest change in a long time, esp. due to the heap storage changes that I already discussed. One this is finished, I expect to turn towards C types with relative ease.

Tox Plugin

Anthony Shaw took on Tox and Nuitka and created a plugin that allows using Nuitka. I am still wrapping my head around these things. It's only a proof of concept yet. I will give it more coverage in the future.

Twitter

Follow me on twitter if you like, I will:

Follow @kayhayen

Hotfixes

So there have even more hotfixes. One addresses memory leaks found with the yield from while I was adding tests. Usually if I encounter an old issue that has a small fix, that is what I do, push out a hotfix using the git flow model. Also nested namespace packages for Python3, those are the ones without a __init__.py were not working after the original directory was removed, and that got fixed.

And right now, I have hotfixes for frames close method, which apparently was never updated to work properly for coroutines and asyncgen. That is going to be in the next hotfix.

Plans

So the heap storage seems pretty complete now, and goto generators are on the final stretch. As always, things feel right around the corner. But it's unclear how much longer I will have to debug. I am pretty sure the bare work of doing asyncgen is going to be low. Debugging that too then, that is the hard part.

A new release seems justified, but I kind of do not want to make it without that major new code used. Because apparently during the debugging, I tend to find issues that need hotfixes, so I will wait for the goto generator work to finish.

Nuitka this week #3

New Series Rationale

This is working out well so far. I think driving more attention at the things that are going on can only be good. Also to explain will always help. It also kind of motivates me a bit.

Twitter

Also as part of my communications offensive, I am using my Twitter account more regularily. I used to highlight important fixes, or occasionally releases of some importance there. I will continue to do only important stuff there, but with more regularity.

And I noticed in the past, even when I do not post, followers makes me happy. So here you go:

Follow @kayhayen

Goto Generators

This continues TWN #2 where I promised to speak more of it, and this is the main focus of my work on Nuitka right now.

Brief summary, context switches were how this was initially implemented. The main reason being that for C++ there never was going to be a way to save and restore state in the middle of an expression that involves constructors and destructors.

Fast forward some years, and C-ish entered the picture. No objects are used anymore, and Nuitka is purely C11 now, which has convinience of C++, but no objects. Instead goto is used a lot already. So everytime an exception occurs, a goto is done, every time a branch is done, a loop exit or continue, you get it, another goto.

But so far, all Python level variables of a frame live on that C stack still, and the context switch is done with functions that swap stack. That is fast, but the imporant drawback is that it takes more memory. How deep of a stack will we need? And we can use really many, if you imagine a pool of 1000 coroutines, that quickly become impossible to deal with.

So, the new way of doing this basically goes like this:

def g():
    yield 1
    yield 2

This was some far becoming something along this lines:

PyObject *impl_g( NuitkaGenerator *generator )
{
     YIELD( const_int_1 );
     YIELD( const_int_2 );

     PyErr_SetException( StopIteration );
     return NULL;
}

The YIELD in there was basically doing the switching of the stacks and for the C code, it looked like a normal function call.

In the new approach, this is done:

PyObject *impl_g( NuitkaGenerator *generator )
{
     switch( generator->m_resume_point )
     {
          case 1: goto resume_1;
          case 2: goto resume_2;
     }

     generator->m_yielded = const_int_1;
     generator->resume_point = 1
     return NULL;
     resume_1:

     generator->m_yielded = const_int_2;
     generator->resume_point = 2
     return NULL;
     resume_2:

     PyErr_SetException( StopIteration );
     return NULL;
}

As you can see, the function has an initial dispatcher. Resume point 0 means we are starting at the top. Then every yield results in a function return with an updated resume point.

I experimented with this actually a long time ago, and experimental code was the result that remained in Nuitka. The problem left to solve was to store the variables that would normally live on the stack, in a heap storage. That is what I am currently working on.

This leads me to "heap storage", which is what I am currently working on and will report on next week. Once that is there, goto generators can work, and will become the norm. Until then, I am refactoring a lot to get accesses to variable go through proper objects that know their storage locations and types.

Hotfixes

So there have been 2 more hotfixes. One was to make the enum and __new__ compatibility available that I talked about last week in TWN #2 <./nuitka-this-week-2.html#python3-enumerators> coupled with a new minor things.

And then another one, actually important, where Python3 __annotations__ by default was the empty dictionary, but then could be modified, corrupting the Nuitka internally used one severely.

Right now I have on factory another fix for nested namespace packages in Python3 and that might become another hotfix soon.

As you know, I am following the git flow model, where it's easy to push out small fixes, and just those, on top of the last release. I tend to decide based on importance. However, I feel that with the important fixes in the hotfixes now, it's probably time to make a full release, to be sure everybody gets those.

Plans

Finishing heap storage is my top priority right now and I hope to complete the refactorings necessary in the coming week. I will also talk about how it also enables C types work next week.

Until next week then!

Nuitka this week #2

New Series Rationale

As discussed last week in TWN #1 this is a new series that I am using to highlight things that are going on, newly found issues, hotfixes, all the things Nuitka.

Python 3.7

I made the first release with official 3.7 support, huge milestone in terms of catching up. Generic classes posed a few puzzles, and need more refinements for error handling, but good code works now.

The class creation got a bit more complex, yet again, which will make it even hard to know the exact base classes to be used. But eventually we will manage to overcome this and statically optimize that.

MSI 3.7 files for Nuitka

Building the MSI files for Nuitka ran into a 3.7.0 regression of CPython failing to build them, that I reported and seems to be valid bug of theirs.

So they will be missing for some longer time. Actually I wasn't so sure if they are all that useful, or working as expected for the runners, but with the -m nuitka mode of execution, that ought to be a non-issue. so it would be nice to keep them for those who use them for deployment internally.

Planned Mode

I have a change here. This is going to be a draft post until I publish it, so I might the link, or mention it on the list, but I do not think I will wait for feedback, where there is not going to be all that much.

So I am shooting this off the web site.

Goto Generators

This is an exciting field of work, that I have been busy with this week. I will briefly describe the issue at hand.

So generators in Python are more generally called coroutines in other places, and basically that is code shaking hands, executing resuming in one, handing back a piece of data back and forth.

In Python, the way of doing this is yield and more recently yield from as a convience way for of doing it in a loop in Python3. I still recall the days when that was a statement. Then communication was one way only. Actually when I was still privately doing Nuitka based on then Python 2.5 and was then puzzled for Python 2.6, when I learned in Nuitka about it becoming an expression.

The way this is implemented in Python, is that execution of a frame is simply suspended, and another frame stack bytecode is activated. This switching is of course very fast potentially, the state is already fully preserved on the stack of the virtual machine, which is owned by the frame. For Nuitka, when it still was C++, it wasn't going to be possible to interrupt execution without preserving the stack. So what I did was very similar, and I started to use makecontext/setcontext to implement what I call fibers.

Basically that is C level stack switching, but with a huge issue. Python does not grow stacks, but can need a lot of stack space below. Therefore 1MB or even 2MB per generator was allocated, to be able to make deep function calls if needed.

So using a lot of generators on 32 bits could easily hit a 2GB limit. And now with Python3.5 coroutines people use more and more of them, and hit memory issues.

So, goto generators, now that C is possible, are an entirely new solution. With it, Nuitka will use one stack only. Generator code will become re-entrant, store values between entries on the heap, and continue execution at goto destinations dispatched by a switch according to last exit of the generator.

So I am now making changes to cleanup the way variable declarations and accesses for the C variables are being made. More on that next week though. For now I am very exited about the many cleanups that stem from it. The code generation used to have many special bells and whistles, and they were generalized into one thing now, making for cleaner and easier to understand Nuitka code.

Python3 Enumerators

On interesting thing, is that an incompatiblity related to __new__ will go away now.

The automatic staticmethod that we had to hack into it, because the Python core will do it for uncompiled functions only, had to be done while declaring the class. So it was visible and causing issues with at least the Python enum module, which wants to call your __new__ manually. Because why would it not?!

But turns out, for Python3 the staticmethod is not needed anymore. So this is now only done for Python2, where it is needed, and things work smoothly with this kind of code now too. This is currently in my factory testing and will probably become part of a hotfix if it turns out good.

Hotfixes

Immediately after the release, some rarely run test, where I compiled all the code on my machine, found 2 older bugs, obscure ones arguably, that I made into a hotfix, also because the test runner was having a regression with 3.7, which prevented some package builds. So that was 0.5.32.1 release.

And then I received a bug report about await where a self test of Nuitka fails and reports an optimization error. Very nice, the new exceptions that automatically dump involved nodes as XML made it immediately clear from the report, what is going on, even without having to reproduce anything. I bundled a 3.7 improvement for error cases in class creation with it. So that was the 0.5.32.2 release.

Plans

Finishing goto generators is my top priority, but I am also going over minor issues with the 3.7 test suite, fixing test cases there, and as with e.g. the enum issue, even known issues this now finds.

Until next week.

Nuitka Release 0.5.32

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains substantial new optimization, bug fixes, and already the full support for Python 3.7. Among the fixes, the enhanced coroutine work for compatiblity with uncompiled ones is most important.

Bug Fixes

  • Fix, was optimizing write backs of attribute in-place assignments falsely.
  • Fix, generator stop future was not properly supported. It is now the default for Python 3.7 which showed some of the flaws.
  • Python3.5: The __qualname__ of coroutines and asyncgen was wrong.
  • Python3.5: Fix, for dictionary unpackings to calls, check the keys if they are string values, and raise an exception if not.
  • Python3.6: Fix, need to check assignment unpacking for too short sequences, we were giving IndexError instead of ValueError for these. Also the error messages need to consider if they should refer to "at least" in their wording.
  • Fix, outline nodes were cloned more than necessary, which would corrupt the code generation if they later got removed, leading to a crash.
  • Python3.5: Compiled coroutines awaiting uncompiled coroutines was not working properly for finishing the uncompiled ones. Also the other way around was raising a RuntimeError when trying to pass an exception to them when they were already finished. This should resolve issues with asyncio module.
  • Fix, side effects of a detected exception raise, when they had an exception detected inside of them, lead to an infinite loop in optimization. They are now optimized in-place, avoiding an extra step later on.

New Features

  • Support for Python 3.7 with only some corner cases not supported yet.

Optimization

  • Delay creation of StopIteration exception in generator code for as long as possible. This gives more compact code for generations, which now pass the return values via compiled generator attribute for Python 3.3 or higher.
  • Python3: More immediate re-formulation of classes with no bases. Avoids noise during optimization.
  • Python2: For class dictionaries that are only assigned from values without side effects, they are not converted to temporary variable usages, allowing the normal SSA based optimization to work on them. This leads to constant values for class dictionaries of simple classes.
  • Explicit cleanup of nodes, variables, and local scopes that become unused, has been added, allowing for breaking of cyclic dependencies that prevented memory release.

Tests

  • Adapted 3.5 tests to work with 3.7 coroutine changes.
  • Added CPython 3.7 test suite.

Cleanups

  • Removed remaining code that was there for 3.2 support. All uses of version comparisons with 3.2 have been adapted. For us, Python3 now means 3.3, and we will not work with 3.2 at all. This removed a fair bit of complexity for some things, but not all that much.
  • Have dedicated file for import released helpers, so they are easier to find if necessary. Also do not have code for importing a name in the header file anymore, not performance relevant.
  • Disable Python warnings when running scons. These are particularily given when using a Python debug binary, which is happening when Nuitka is run with --python-debug option and the inline copy of Scons is used.
  • Have a factory function for all conditional statement nodes created. This solved a TODO and handles the creation of statement sequences for the branches as necessary.
  • Split class reformulation into two files, one for Python2 and one for Python3 variant. They share no code really, and are too confusing in a single file, for the huge code bodies.
  • Locals scopes now have a registry, where functions and classes register their locals type, and then it is created from that.
  • Have a dedicated helper function for single argument calls in static code that does not require an array of objects as an argument.

Organizational

  • There are now requirements-devel.txt and requirements.txt files aimed at usage with scons and by users, but they are not used in installation.

Summary

This releases has this important step to add conversion of locals dictionary usages to temporary variables. It is not yet done everywhere it is possible, and the resulting temporary variables are not yet propagated in the all the cases, where it clearly is possible. Upcoming releases ought to achieve that most Python2 classes will become to use a direct dictionary creation.

Adding support for Python 3.7 is of course also a huge step. And also this happened fairly quickly and soon after its release. The generic classes it adds were the only real major new feature. It breaking the internals for exception handling was what was holding back initially, but past that, it was really easy.

Expect more optimization to come in the next releases, aiming at both the ability to predict Python3 metaclasses __prepare__ results, and at more optimization applied to variables after they became temporary variables.

Nuitka this week #1

New Series Rationale

I think I tend to prefer coding over communication too much. I think I need to make more transparent what I am doing. Also things, will be getting exciting continuously for a while now.

I used to status report posts, many years ago, every 3 months or so, and that was nice for me also to get an idea of what changed, but I stopped. What did not happen, was to successfully engage other people to contribute.

This time I am getting more intense. I will aim to do roughly weekly or bi-weekly reports, where I highlight things that are going on, newly found issues, hotfixes, all the things Nuitka.

Planned Mode

I will do it this fashion. I will write a post to the mailing list, right about wednesday every week or so. I need to pick a day. I am working from home that day, saving me commute time. I will invest that time into this.

The writing will not be too high quality at times. Bare with me there. Then I will check feedback from the list, if any. Hope is for it to point out the things where I am not correct, missing, or even engage right away.

Topics are going to be random, albeit repeating. I will try and make links to previous issues where applicable. Therefore also the TOC, which makes for link targets in the pages.

Locals Dict

When I am speaking of locals dict, I am talking of class scopes (and functions with exec statements). These started to use actual dictionary a while ago, which was a severe setback to optimization.

Right now, so for this week, after a first prototype was making the replacement of local dict assignment and references for Python2, and kind of worked through my buildbots, flawlessly, I immediately noticed that it would require some refactoring to not depend on the locals scopes to be only in one of the trace collections. Thinking of future inlining, maybe part of a locals scope was going to be in multiple functions, that ought to not be affected.

Therefore I made a global registry of locals scopes, and working on those, I checked its variables, if they can be forward propagated, and do this not per module, but after all the modules have been done. This is kind of a setback for the idea of module specific optimization (cachable later on) vs. program optimization, but since that is not yet a thing, it can remain this way for now.

Once I did that, I was interested to see the effect, but to my horror, I noticed, that memory was not released for the locals dict nodes. It was way too involved with cyclic dependencies, which are bad. So that was problematic of course. Compilation to keep nodes in memory for both tracing the usage as a locals dict and temporary variables, wasn't going to help scaling at all.

Solution is finalization

Nodes need Finalization

So replaced nodes reference a parent, and then the locals scope references variables, and trace collections referencing variables, which reference locals scopes, and accesses referencing traces, and so on. The garbage collector can handle some of this, but seems I was getting past that.

For a solution, I started to add a finalize method, which released the links for locals scopes, when they are fully propagated, on the next run.

Adding a finalize to all nodes, ought to make sure, memory is released soon, and might even find bugs, as nodes become unusable after they are supposedly unused. Obviously, there will currently be cases, where nodes becomes unused, but they are not finalized yet. Also, often this is more manual, because part of the node is to be released, but one child is re-used. That is messy.

Impact on Memory Usage

The result was a bit disappointing. Yes, memory usage of mercurial compilation went back again, but mostly to what it had been. Some classes are now having their locals dict forward propagated, but the effect is not always a single dictionary making yet. Right now, function definitions, are not forward at all propagated. This is a task I want to take on before next release though, but maybe not, there is other things too. But I am assuming that will make most class dictionaries created without using any variables at all anymore, which should make it really lean.

Type Hints Question

Then, asking about type hints, I got the usual question about Nuitka going to use it. And my stance is unchanged. They are just hints, not reliable. Need to behave the same if users do it wrong. Suggested to create decorated which make type hints enforced. But I expect nobody takes this on though. I need to make it a Github issue of Nuitka, although technically it is pure CPython work and ought to be done independently. Right now Nuitka is not yet there anyway yet, to take full advantage.

Python 3.7

Then, for Python 3.7, I have long gotten the 3.6 test suite to pass. I raised 2 bugs with CPython, one of which lead to update of a failing test. Nuitka had with large delay, caught of with what del __annotations__ was doing in a class. Only with the recent work for proper locals dict code generation, we could enforce a name to be local, and have proper code generation, that allows for it to be unset.

This was of course a bit of work. But the optimization behind was always kind of necessary to get right. But now, that I got this, think of my amazement when for 3.7 they reverted to the old behavior, where annotiatons then corrupt the module annotations

The other bug is a reference counting bug, where Nuitka tests were failing with CPython 3.7, and turns out, there is a bug in the dictionary implementation of 3.7, but it only corrupts counts reported, not actual objects, so it's harmless, but means for 3.7.0 the reference count tests are disabled.

Working through the 3.7 suite, I am cherry picking commits, that e.g. allow the repr of compiled functions to contain <compiled_function ...> and the like. Nothing huge yet. There is now a subscript of type, and foremost the async syntax became way more liberal, so it is more complex for Nuitka to make out if it is a coroutine due to something happening inside a generator declared inside of it. Also cr_origin was added to coroutines, but that is mostly it.

Coroutine Compatibility

A bigger thing was that I debugged coroutines and their interaction with uncompiled and compiled coroutines awaiting one another, and turns out, there was a lot to improve.

The next release will be much better compatible with asyncio module and its futures, esp with exceptions to cancel tasks passed along. That required to clone a lot of CPython generator code, due to how ugly they mess with bytecode instruction pointers in yield from on an uncompiled coroutine, as they don't work with send method unlike everything else has to.

PyLint Troubles

For PyLint, the 2.0.0 release found new things, but unfortunately for 2.0.1 there is a lot of regressions that I had to report. I fixed the versions of first PyLint, and now also Astroid, so Travis cannot suddenly start to fail due to a PyLint release finding new warnings.

Currently, if you make a PR on Github, a PyLint update will break it. And also the cron job on Travis that checks master.

As somebody pointed out, I am now using requires.io <https://requires.io/github/kayhayen/Nuitka/requirements/?branch=factory> to check for Nuitka dependencies. But since 1.9.2 is still needed for Python2, that kind of is bound to give alarms for now.

TODO solving

I have a habit of doing off tasks, when I am with my notebook in some place, and don't know what to work on. So I have some 2 hours recently like this, and used it to look at TODO and resolve them.

I did a bunch of cleanups for static code helpers. There was one in my mind about calling a function with a single argument. That fast call required a local array with one element to put the arg into. That makes using code ugly.

Issues Encountered

So the enum module of Python3 hates compiled classes and their staticmethod around __new__. Since it manually unwraps __new__ and then calls it itself, it then finds that a staticmethod object cannot be called. It's purpose is to sit in the class dictionary to give a descriptor that removes the self arg from the call.

I am contemplating submitting an upstream patch for CPython here. The hard coded check for PyFunction on the __new__ value is hard to emulate.

So I am putting the staticmethod into the dictionary passed already. But the undecorated function should be there for full compatibility.

If I were to make compiled function type that is both a staticmethod alike and a function, maybe I can work around it. But it's ugly and a burden. But it would need no change. And maybe there is more core wanting to call __new__ manually

Plans

I intend to make a release, probably this weekend. It might not contain full 3.7 compatibility yet, although I am aiming at that.

Then I want to turn to "goto generators", a scalability improvement of generators and coroutines that I will talk about next week then.

Until next week.

Nuitka Release 0.5.31

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release is massive in terms of fixes, but also adds a lot of refinement to code generation, and more importantly adds experimental support for Python 3.7, while enhancing support for Pyt5 in standalone mode by a lot.

Bug Fixes

  • Standalone: Added missing dependencies for PyQt5.Qt module.

  • Plugins: Added support for PyQt5.Qt module and its qml plugins.

  • Plugins: The sensible plugin list for PyQt now includes that platforms plugins on Windows too, as they are kind of mandatory.

  • Python3: Fix, for uninstalled Python versions wheels that linked against the Python3 library as opposed to Python3X, it was not found.

  • Standalone: Prefer DLLs used by main program binary over ones used by wheels.

  • Standalone: For DLLs added by Nuitka plugins, add the package directory to the search path for dependencies where they might live.

  • Fix, the vars built-in didn't annotate its exception exit.

  • Python3: Fix, the bytes and complex built-ins needs to be treated as a slot too.

  • Fix, consider if del variable must be assigned, in which case no exception exit should be created. This prevented Tkinter compilation.

  • Python3.6: Added support for the following language construct:

    d = {"metaclass" : M}
    
    class C(**d):
       pass
    
  • Python3.5: Added support for cyclic imports. Now a from import with a name can really cause an import to happen, not just a module attribute lookup.

  • Fix, hasattr was never raising exceptions.

  • Fix, bytearray constant values were considered to be non-iterable.

  • Python3.6: Fix, now it is possible to del __annotations__ in a class and behave compatible. Previously in this case we were falling back to the module variable for annotations used after that which is wrong.

  • Fix, some built-in type conversions are allowed to return derived types, but Nuitka assumed the excact type, this affected bytes, int, long, unicode.

  • Standalone: Fix, the _socket module was insisted on to be found, but can be compiled in.

New Features

  • Added experimental support for Python 3.7, more work will be needed though for full support. Basic tests are working, but there are are at least more coroutine changes to follow.
  • Added support for building extension modules against statically linked Python. This aims at supporting manylinux containers, which are supposed to be used for creating widely usable binary wheels for Linux. Programs won't work with statically linked Python though.
  • Added options to allow ignoring the Windows cache for DLL dependencies or force an update.
  • Allow passing options from distutils to Nuitka compilation via setup options.
  • Added option to disable the DLL dependency cache on Windows as it may become wrong after installing new software.
  • Added experimental ability to provide extra options for Nuitka to setuptools.
  • Python3: Remove frame preservation and restoration of exceptions. This is not needed, but leaked over from Python2 code.

Optimization

  • Apply value tracing to local dict variables too, enhancing the optimization for class bodies and function with exec statements by a lot.
  • Better optimization for "must not have value", wasn't considering merge traces of uninitialized values, for which this is also the case.
  • Use 10% less memory at compile time due to specialized base classes for statements with a single child only allowing __slots__ usage by not having multiple inheritance for those.
  • More immediately optimize branches with known truth values, so that merges are avoided and do not prevent trace based optimization before the pass after the next one. In some cases, optimization based on traces could fail to be done if there was no next pass caused by other things.
  • Much faster handling for functions with a lot of eval and exec calls.
  • Static optimization of type with known type shapes, the value is predicted at compile time.
  • Optimize containers for all compile time constants into constant nodes. This also enables further compile time checks using them, e.g. with isinstance or in checks.
  • Standalone: Using threads when determining DLL dependencies. This will speed up the un-cached case on Windows by a fair bit.
  • Also remove unused assignments for mutable constant values.
  • Python3: Also optimize calls to bytes built-in, this was so far not done.
  • Statically optimize iteration over constant values that are not iterable into errors.
  • Removed Fortran, Java, LaTex, PDF, etc. stuff from the inline copies of Scons for faster startup and leaner code. Also updated to 3.0.1 which is no important difference over 3.0.0 for Nuitka however.
  • Make sure to always release temporary objects before checking for error exits. When done the other way around, more C code than necessary will be created, releasing them in both normal case and error case after the check.
  • Also remove unused assignments in case the value is a mutable constant.

Cleanups

  • Don't store "version" numbers of variable traces for code generation, instead directly use the references to the value traces instead, avoiding later lookups.
  • Added dedicated module for complex built-in nodes.
  • Moved C helpers for integer and complex types to dedicated files, solving the TODOs around them.
  • Removed some Python 3.2 only codes.

Organizational

  • For better bug reports, the --version output now contains also the Python version information and the binary path being used.
  • Started using specialized exceptions for some types of errors, which will output the involved data for better debugging without having to reproduce anything. This does e.g. output XML dumps of problematic nodes.
  • When encountering a problem (compiler crash) in optimization, output the source code line that is causing the issue.
  • Added support for Fedora 28 RPM builds.
  • Remove more instances of mentions of 3.2 as supported or usable.
  • Renovated the graphing code and made it more useful.

Summary

This release marks important progress, as the locals dictionary tracing is a huge step ahead in terms of correctness and proper optimization. The actual resulting dictionary is not yet optimized, but that ought to follow soon now.

The initial support of 3.7 is important. Right now it apparently works pretty well as a 3.6 replacement already, but definitely a lot more work will be needed to fully catch up.

For standalone, this accumulated a lot of improvements related to the plugin side of Nuitka. Thanks to those involved in making this better. On Windows things ought to be much faster now, due to parallel usage of dependency walker.