Nuitka this week #11

Communication vs. Coding

I continue to force myself to report more publicly, and it feels good. This time things are in a stablizing period, and I feel I have a consistent message.

Bear in mind, that this is supposed to be a quick, not too polished, and straight from top of my head, even if really a lot of content. But I feel that esp. the optimization parts are worth reading.

Optimization Work

So, the 0.6.1 optimization work has been a lot. And it's containing improvements on every level. I think I will detail the levels in another section.

Levels of Optimization

First level is of course node level optimization. Here 0.6.1 adds many things, from better handling of closure variables not all as unknown every time control flow escapes, to some operations + and comparisons on known built-in type shapes to now be able to statically tell that they do not raise. The opposite (does definitely raise) is prepared, but not yet used.

This allows for type shapes to be longer known. Now a+b+c can be known, but previously only a+b was sort of known, and little used information.

The next level is picking the C target type. Here seeing more operations and understanding more variables allows to more often pick the C bool or C void types over the PyObject * C type. For 0.6.1 I have observed that esp. more indicator variables make it to that stage, generating way more efficient C code (for that indicator variable) for those in many instances, esp. with loops, as these no longer loose type shape information as badly as they did.

The, another level is when it is treated as an object, but known to be int, there are way more helpers used for +/+= and a whole new set of them for comparisons, that in these cases of full or partial type knowledge operate faster.

And even if e.g. only one type is known, this still alllows to not make a lot of tests about it, and to avoid attempted shortcuts that cannot work. For 0.6.1 the + and += are pretty well covered for these, but some variants are not yet tuned to take all type knowledge advantage.

These will be also the building block, once the C type layer picks types like "C int or PyObject * known to be int" with indicator flags which values are currently valid to use, then these specialized calls still make sense.

The most attrative level, "C int" has not been reached for 0.6.1 but for my loop example and Python3, I can say that now would be a nice time to start it, as type shape knowledge is all there. This was totally not the case for 0.6.0, but it seems that this step will have to be postponed to another release, maybe 0.6.2, maybe even later.

Week of Bugfixing

But something that bothers me is seeing the issue tracker pile up on actionable items, where I just have not taken action. So as announced on Twitter already, I am having and continue to have bug fixing time. I am acting on issues that are relatively old and easy to act on, or where I have no hope of this happening by anybody else anymore.

I have listed some interesting examples below. But basically these are small, relatively unimportant, yet somewhat import for some use cases things.

Exec on Filehandles

So when doing exec on a filehandle, Nuitka was at runtime reading the source, then compiling it, but forgetting about the filename. This makes things like inspect.getsource() fail on functions from there, and ugly tracebacks not pointing to the filename. This was one of the things which I had understood, but not did the actual work yet.

pkgutil.iter_modules

And another one, which seemed just not done, but turned out to be rather complex, this one needs to populate a sys.path_importer_cache for imported modules, and then to report the child modules. There was no obect to carry that information, so now instances of the meta path based importer are associated for every import.

Turns out for Python3, my simplistic type building calling type manually here does not work, as __init__ and iter_modules do not become anything but static methods ever. Needs a real type.

Plus, I had to disable it for now, because mixed packages, like the one we do with multiprocessing" where only part is compiled (the one required) and part is pure Python from disk still, stopped to work. The ``iter_modules it seems will have to cover that case too.

So no luck, postponing this until next week of bug fixes. Frustrating a bit, but such is life.

When to release

There are still some issues that I want to get to. Specicially the OpenGL plugins which has been research ever since, and nobody stepped up, but it's rather trivial. And the Tcl/Tk for Windows. People have provided sufficient instructions for a plugin that I am going to write this week.

Once I feel the issue tracker is clean, I will release. As a matter of experience, it is then going to grow a lot again.

Google Summer of Code for Nuitka

Finally somebody has stepped up, which means a lot to me. Now to the actual work!

Twitter

I continue to be very active there.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Adding Twitter more prominently to the web site is something that is also going to happen.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka this week #10

Communication vs. Coding

Recently it was a bit more tough to make that decision. First, there was much going privately, with me ill, then child ill, and ill again, and myself, and that made me have a much harder time to communicate about incomplete things.

Even now, I am torn between fixing issues for 0.6.1 and doing this, but I know that it will take at least one week, so I am missing the point, if I wait for it more.

Bear in mind, that this is supposed to be a quick, not too polished, and straight from top of my head, even if really a lot of content. But I feel that esp. the optimization parts are worth reading.

Hotfixes

There has been another hotfix, 0.6.0.6 and there ought to be one 0.6.0.7, at least on factory there is a bunch of stuff for it, but I didn't actually do it yet. I was wandering between there will be a release anyway, and the feeling that some of the material may cause regressions, so I might skip on that really.

So for the most fixes, I suspect, develop is going to be the way until next week.

Google Summer of Code for Nuitka

Nobody has stepped up, which means it will not happen unfortunately. This would be your last chance to step up. I know you will feel not qualified. But I just need a backup that will help a student around obstacles in case I go missing. Contact me and I will be very happy.

Pythran and Nuitka

As suggested by @wuoulf (Wolf Vollprecht) we had a meeting at the side of the PyCon DE 2018 conference in Karlsruhe, abusing the C++ regular table as a forum for that, which was a very nice experience.

First of all, Wolf is so much more knowledgeable about AnaConda and could point out to me, very important stuff, not the least, that AnaConda contains its own compiler, which I have successfully used since, to first add easier installation instructions for Windows, and second, to successfully statically link with LTO on Linux amd64. Both of which are important for me.

But for Pythran which is limited Python, specialized to translate Numpy API to C++, we showed each other, Nuitka and Pythran details, and somehow in my mind a plan formed how Nuitka could use the Pythran tricks long term, and mid term, how it could include a plugin that will allow to integrate with Pythran compilation.

This was a huge success.

Performance Work

Adding specialized object operations

See last week, this has seen more completion. Both + and += are more or less covered for the selected subset. The CPython test suites were initially not finding uses, but with more and more optimization phase improvements, it challenges code generation with missing ones, and then I added them more and more.

Controlflow Descriptions

Shapes were added for the + and < operation so far, but didn't influence anything else really but code generation, but of course they should also impact optimization phase.

So the query for type shape has been enhanced to return not only a type shape saying that int+float -> float, but also now an object that describes impact on control flow of the program. This can then say e.g. that this doesn't execute arbitrary code, and that it does not modify input values, things used in the code generation to avoid error checks, and in the optimization to not have to mark things as unknown.

Preparations for comparison operations

So optimization now also has proper type shape functions for the < and the warnings when they fail to know what to do for concrete types. This allows to actually remove checks, but so far this wasn't exposed for neither + or for <. Doing this eliminates the exception check for the operation part, where previosuly it was done if anything in the expression could raise.

Specializing the rich comparisons helper codes is the next step, but so far I didn't quite get to it yet, but it has been started.

Comparison Conditions

Preparing < optimization for the loop, I noticed that not was optimized for in to become not in, and also is to become is not, etc. but for comparisons, where we can not the result is of bool shape, we can now also switch not < to >= and not = to != of course.

And since our reformulation of while a < b ends up having a statement like if not a < b: break as part of its re-formulation, that is again one step closer to optimizing my example loop.

Local variable escaping

Much to my shock, I noticed that the code which is responsible to handle escaping control flow (i.e. unknown code is executed), was not only doing what it was supposed to do, i.e. mark closure variables as unknown, but more or less did it for all local variables with Python3.

Fixing that allows for a lot more optimization obviously, and makes my test find missing ones, and even bugs in existing ones, that were previously hidden. A good thing to notice this regression (was better once), now that I am looking at concrete examples.

One noticable sign was that more of my tests failed with warnings about missing code helpers. And another that in my while loop with int increase, it now seems as if Python3 is good. For Python2, the "int or long" shape will need dedicated helpers. That is because ìnt + int becomes either int or long there, where Python3 only has long but renamed it int.

Benchmarks Missing

Speedcenter got repaired, but I need to add the loop examples I am using as test cases before next release, so I can show what Nuitka 0.6.1 will have achieved or at least have improved somewhat already.

But currently these examples only serve as input for general improvements that then take a lot of time, and don't have immediate impact on their own.

Still would be good to see where Nuitka is standing after each one.

Static Linking

So static linking works now, provided it's not a pyenv crappy libpython.a but one that can actually work. I got this to work on Linux and using the Conda CC, even LTO will work with it. Interestingly then linking is noticely slow, and I bet ccache and the likes won't help with that.

I am interested to see what this means for performance impact. But it will allow to address issues, where embedded CPython run time is plain slower than the one that lives in the python binary. For acceleration this is great news.

Conda CC

Using Conda CC by default as a fallback in --mingw mode on Windows is something that was easy to add. So when no other gcc is found, and MSVC is not tried in this mode, and the right directory is added to PATH automatically, with Anaconda, things should now be smoother. It has also its own libpython.a, not sure yet if it's a static link library, that would be fantastic, but unlike standard MinGW64 we do not have to roll our own at least.

I will try with --lto eventually though and see what it does. But I think static linking on Windows is not supported by CPython, but I am not entirely sure of that.

Annotations Future Feature

Found a 3.7 feature that is not covered by the test suite, the __future__ flag annotations wasn't working as expected. In this, strings are to be used for __annotations__ where they show up (many are ignored simply) and that requires an unparse function, going from parsed ast (presumably it's still syntax checked) back to the string, but that was only very hard to get at, and with evil hackery.

For 3.8 a bug fix is promised that will give us the string immediately, but for now my hack must suffice.

MSI files

Following the 3.7.1 release, there are MSI files again, as the regression of 3.7.0 to build them has been fixed in that release. The MSI files will work with 3.7.0 also, just the building was broken.

Overall

So 0.6.1 is in still in full swing in terms of optimization. I think I need to make a release soon, simply because there is too much unreleased, but useful stuff already.

I might have to postpone my goal of C int performance for one example loop until next release. No harm in that. There already are plenty of performance improvements across the board.

Twitter

I continue to be very active there.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Adding Twitter more prominently to the web site is something that is also going to happen.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka this week #9

Communication vs. Coding

My new communication strategy is a full success, engagement with Nuitka is on an all time high.

But the recent weeks more than ever highlighted why I have to force myself to do it. I do not like to talk about unfinished stuff. And right now, there is really a lot of it, almost only it. Also I was ill, and otherwise busy, so this is now late by a week.

But I am keeping it up, and will give an update, despite the feeling that it would be better to just finish a few of those things and then talk about it, but then it will take forever and leave you in the dark. And that is not what is supposed to be.

Bear in mind, that this is supposed to be a quick, not too polished, and straight from top of my head, even if really a lot of content. But I feel that esp. the optimization parts are worth reading.

Hotfixes

So the 0.6.0 release was a huge success, but it definitely wasn't perfect, and hotfixes were necessary. The latest one 0.6.0.5 was done just yesterday and actually contains one for an important mis-optimization being done, and you ought to update to it from any prior 0.6.0 release.

There are also a few remaining compatibility issues fixed for 3.7 and generally using the latest hotfix is always a good idea.

Kind of what one has to expect from a 0 release, this one also had more expose than usual is seems.

Google Summer of Code for Nuitka

I need more people to work on Nuitka. One way of doing this could be to participate in Google Summer of Code under the Python umbrella. To make that possible, I need you to volunteer as a mentor. So please, please, do.

I know you will feel not qualified. But I just need a backup that will help a student around obstacles in case I go missing. Contact me and I will be very happy.

Website Overhaul

I updated the website to recent Nikola and dropped the tag cloud that I was using. Should have cleaner and better looks. Also integrated privacy aware sharing links, where two clicks are necessary to share a page or article like this one on Twitter, Facebook, etc.

Also the download page saw some structural updates and polishing. It should easier to overview now.

Performance Work

Adding specialized object operations

The feedback for performance and the work on 0.6.1 are fully ongoing, and there are many major points that are ongoing. I want to briefly cover each one of them now, but many of them will only have full effect, once everything is in place, which each one is very critical.

So, with the type tracing, objects have known types, and short of using a C type, knowing e.g. that an object is an int, and the other one too, doing + for them can take a lot of advantage avoiding unrelated checks and code paths, even if still using PyObject * at the end of the day.

And even we are only knowing it's not an int, but say one value is a tuple and the other an unknown, that allows to remove checks for int shortcuts as they can no longer apply. These are tiny optimizations then, but still worthwhile.

To further this, first the inplace operations for a couple of more or less randomly selected types, list, tuple, int, long, str, unicode, bytes, and float, have been looked at and have gotten their own special object based helpers if one or both types are known to be of that kind.

Finding missing specialized object code generation

A report has been added, that will tell when such an operation could have been used, but was not available. This uncovered where typical stuff goes non optimized, a nice principle to see what is actually happening.

So adding list and str would now give a warning, although of course, the optimization phase ought to catch the static raise that is and never let it get there, so this report also addresses missing optimization in an earlier phase.

Optimizing plain object operations too

So the in-place operations were then covered, so this was extended to mere + operations too, the ones that are not in-place. Sometimes, esp. for immutable types, there was already code for that, e.g. int doesn't really do it, in other cases, list + list code for a quicker concat was added.

And again a report for where it's missing was added and basic coverage for most of the types. However, in some instances, the optimization doesn't use the full knowledge yet. But where it does, it will shove off quite a few cycles.

Lack of type knowledge

To apply these things effectively, optimization and value tracing need to know types in the first place. I have found two obstacles for that. One are branch merges. If a branch or both assign to the same type or original type, well the type is changed. Previously it became "unknown" which is treated as object for code generation, and allows nothing really. But now that is better on develop now, and was actually a trivial missing thing.

The other area is loops. Loops put values to unknown when entering loop body, and again when leaving. Essentially making type tracing not effective where it is needed the most to achieve actual performance. Also this was limiting the knowledge for all function to one type to not happening for these kinds of variables that were assigned inside a loop at all.

Took me a while, but I figured out how to build type tracing for loops that works. It currently is still unfinished in my private repo, but passes all tests, I would just like to make it use dedicated interfaces, and clean it up.

I will most likely have that for 0.6.1 too and that should expand the cases where types are known in code generation by a fair amount.

The effect of that will be that more often C code generation will actually see types. Currently e.g. a boolean variable that is assigned in a loop, cannot use the C target type in code generation. Once loop code is merged, it will however take advantage there too. And only then I think adding "C int" as a C type makes sense at all.

Performance regressions vs. CPython

Then another area is performance regressions. So one thing I did early on in the 0.6.1 cycle was using the "module var C target type" to get in-place working for those too. Doing string concatenations on module variables could be slower by an order of magnitude, as could be other operations.

I still need to do it for closure variables too. Then Nuitka will do at least as many of them perfectly as CPython does. It also will be better at it them, because e.g. it doesn't have to delete from the module dictionary first, due to it never taking a reference, and same applies to the cell. Should be faster for that too.

But strings in-place on these if not optimized, it will look very ugly in terms of worse performance, so 0.6.0 was still pretty bad for some users. This will however hopefully be addressed in 0.6.1 then.

In-place unicode still being bad

Another field was in-place string add for the already optimized case, it was still slower than CPython, and I finally found out what causes this. And that is the using of libpython where PyUnicode_Append is far worse than in the python binary that you normally use, I have see that at least for 3.5 and higher CPython. Analysis showed that e.g. MiniConda had the issue to a much smaller extent, and was being much faster anyway, but probably just has better libpython compilation flags.

So what to do. Ultimately that was to be solved by including a clone of that function, dubbed UNICODE_APPEND that behaves the same, and can even shove off a couple of cycles, by indicating the Python error status without extra checks, and specializing it for the pure unicode += unicode case that we see most often, same for UNICODE_CONCAT for mere +.

Right now the benchmarks to show it do not exist yet. Again something that typically wants me to delay stuff. But as you can imagine, tracking down these hard issues, writing that much code to replace the unicode resizing, is hard enough by itself.

But I hope to convince myself that this will allow to show that for compiled code, things are going to be faster only now.

Benchmarks Missing

In fact, speedcenter as a whole is currently broken, mostly due to Nikola changes that I am trying to work around, but it will take more time apparently and isn't finished as I write this.

Type shapes in optimization

Another optimization end, is the type shapes of the + operation itself. Right now what is being done is that the shape is derived from the shape of the left argument with the right shape to be considered by it. These also have reports now, for cases where they are missing. So saying e.g. that int + float results in float and these kinds of things, are stuff being encoded there right now.

This is necessary step to e.g. know that int + int -> int_or_long, to make effective loop variable optimization.

Without these, and again, that is a lot of code to write, there is no way to hope for wide spread type knowledge in code generation.

Control flow escape

Something missing there, is to also make it known that + unlike it currently is now, should not in all cases lead to "control flow escape" with the consequence of removing all stuff, and expecting an exception possible, but instead to let the int type also make known that + int ont it not only gives an int_or_long result shape, but also while doing so, that it will never raise an exception (bare MemoryError), and therefore allow more optimization to happen and less and therefore faster code generated.

Until this is done, what is actually going to happen is that while the + result is known, Nuitka will assume control flow escape.

And speaking of that, I think this puts too many variables to a too unknown state. You can to distrust all values, but not the types in this case, so that could be better, but right now it is not. Something else to look into.

Overall

So 0.6.1 is in full swing in terms of optimization. All these ends need a completion, and then I can expect to use advantage of things in a loop, and ultimately to generate C performance code for one example of loop. esp. if we add a C int target type, which currently isn't yet started, because I think it would barely be used yet.

But we are getting there and I wouldn't even say we are making small steps, this is all just work to be completed, nothing fundamental about it. But it may take more than one release for sure.

Mind you, there is not only +, there is also -, *, %, and many more operators, all of them will require work. Granted, loop variables tend to use + more often, but any un-optimized operation will immediately loose a lot of type knowledge.

Improved Annotations

There are two kinds of annotations, ones for classes and modules, which actually are stored in a __annotations__ variable, and everything else is mostly just ignored.

So Nuitka got the criterion wrong, and did one thing for functions, and the other for everything else. So that annotations in generators, coroutines and asyncgen ended up with wrong, crashing, and slower code, due to it updating the module __annotations__, so that one is important too if you have to do those.

Release or not

To release or not. There is at least one bug about star imports that affects numpy that is solved in develop, and wasn't back ported, and I was thinking it only applies to develop, but in fact does to stable. It makes me want to release even before all these optimization things happen and are polished, and I might well decide to go with that.

Maybe I only add the closure in-place stuff and the polish the loop SSA stuff, and then call it a release. It already will solve a lot of performance issues that exist right now, while staging the ground for more.

Standalone Improvements

Standalone work is also improving. Using pyi files got more apt, and a few things were added, all of which make sense to be used by people.

But I also have a backlog of issues there however. I will schedule one sprint for those I guess, where I focus on these. I am neglecting those somewhat recently.

Caching Examined

For the static code, I now noticed that it's compiled for each target name, due to the build directory being part of the object file for debug. For gcc 8 there is an option to allow pointing at the original static C file location, and then ccache is more effective, because object files will be the same.

That's actually pretty bad, as most of my machines are on gcc-6 and makes me think that libnuitka.a is really more of an requirement than ever. I might take some time to get this sorted out.

Python3 deprecation warnings

So Nuitka supports the no_warnings Python flag, and for a long time I have been annoyed at how it was not working for Python3 in some cases. The code was manually settign filters, but these would get overridden by CPython test suites testing warnings. And the code said that there is no CPython C-API to control it, which is just plain wrong.

So I changed that and it became possible to remove lots of ignore_stderr annotations in CPython test suites, and more importantly, I can stop adding them for when running older/newer CPython version with a suite.

Twitter

I continue to be very active there.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Adding Twitter more prominently to the web site is something that is also going to happen.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka this week #8

Public / Private CI / Workflow

Note

I wrote this as part of a discussion recently, and I think it makes sense to share it here. This is a lot text though, feel free to skip forward.

Indeed I have a private repo, where I push and only private CI picks up. Based on Buildbot, I run many more compilations, basically around the clock on all of my computers, to find regressions from new optimization or codegen changes, and well UI changes too.

Public CI offerings like Travis are not aimed at allowing this many compilations. It will be a while before public cloud infrastructure will be donated to Nuitka, although I see it happening some time in the future. This leaves developers with the burden to run tests on their own hardware, and never enough. Casual contributors will never be able to do it themselves.

My scope is running the CPython test suites on Windows and Linux. These are the adapted 26, 27, 32, 33, 34, 35, 36, 37 suites, and also to get even more errors covered, they are ran with mismatching Python versions, so a lot of exceptions are raised. Often running the 36 tests with 37 and vice versa will extend the coverage, because of the exceptions being raise.

On Windows I compile with and without debug mode, x86 and x64, and it's kind of getting too much. For Linux I have 2 laptops in use, and an ARM CuBox bought from your donations, there it's working better, esp. due to ccache being used everywhere, although recent investigations show room for improvement there as well.

For memory usage I still compile mercurial and observe the memory it used in addition to comparing the mercurial tests to expected outputs its test suite gives. It's a sad day when Mercurial tests find changes in behavior, and luckily that has been rare. Running the Mercurial test suite gives some confidence in the thing not corrupting data it works with without knowing.

Caching the CPython outputs of tests to compare against is something I am going to make operational these days, trying to make things ever faster. There is no point to re-run tests with Python, just to get at its output, which will typically not change at all.

But for the time being, ccache.exe and clcache.exe seem to have done wonders for Windows too, but I will want to investigate some more to avoid unnecessary cache misses.

Workflow

As for my workflow with Nuitka, I often tend to let some commits settle in my private repo only until they become trusted. Other times I will make bigger changes and put them out to factory immediately, because it will be hard to split up the changes later, so putting them out makes it easier.

I am more conservative with factory right after telling people to try something there. But also I break it on purpose, just trying out something. I really consider it a private branch for interacting with me or public CI. I do not recommend to use it, and it's like a permanent pull request of mine that is not ever going to be finished.

Then on occasions I am making a sorting of all commits on factory and split it into some things that become hotfixes, some things that become current pre-release, and other things that will remain in that proving ground. That is why I typically make hotfix and pre-release at the same times. The git flow suggests doing that and it's easy, so why not. As a bonus, develop is then practically stable at nearly all times too, with hardly any regressions.

I do however normally not take things as hotfixes that are on develop already, I hate the duplication of commits. Hotfixes must be small and risk free, and easy to put out, when there is any risk, it definitely will be on develop. Nuitka stable typically covers nearly all grounds already. No panic needed to add missing stuff and break others.

Hunting bugs with bisect

For me the git bisect is very important. My private commit history is basically a total mess and worthless, but on factory I am making very nice organized commits that I will frequently amend, even for the random PyLint cleanup. This allows me when e.g. one test suddenly says "segfault" on Windows to easily find the change that triggers it, look at C code difference, and spot the bug introduced, then amend the commit and be done with it.

It's amazing how much time this can save. My goal is to always have a workable state which is supposed to pass all tests. Obviously I cannot prove it for every commit, but when I know it to not be the case, I tend to make rebases. At times I have been tempted and followed up on backward amending develop and even stable.

I am doing that to be sure to have that bisect ability, but fortunately it's rare that kind of bug occurs, and I try not to do it.

Experimental Changes

As with recent changes, I sometimes make changes with the isExperimental() marker, activating breaking changes only gradually. The C bool type code generation has been there for months in a barely useful form, until it became more polished, and always guarded with a switch, until one day for 0.6 finally I changed it, and made the necessary fixes retroactively before that switch, to make it work while that was still in factory.

Then I will remove the experimental code. I feel it's very important and even ideal to be able to always compare outputs to a fully working solution. I am willing to postpone some cleanups until later date as a price, but when then something in my mind tells me again "This cannot possibly have ever worked"... a command line flag away, I have the answer to compare, plus, that includes extra changes happened in the meantime, they don't add noise to diff outputs of generated C code for example.

Then looking at that diff, I can tell where the unwanted effect is, and fix all the things, and that way find bugs much faster.

Even better, if I decide to make a cleanup action as part of making a change more viable to execute, then I get to execute it on stable grounds, covered by the full test suite. I can complete that cleanup, e.g. using variable identifier objects instead of mere strings was needed to make "heap generators" more workable. But I was able to put that one to active before "heap generators" was ever fully workable, and complete it, and actually reap some of its benefits already.

Hardware

Obviously this takes a lot of hardware and CPU to be able to compile this much Python code on a regular basis. And I really wish I could add one of the new AMD Threadripper 2 to the mix. Anybody donating one to me? Yes I know, I am only dreaming. But it would really help the cause.

Milestone Release

So the 0.6 is out, and already a hotfix that addresses mostly use cases of people that didn't work. More people seemed to have tried out 0.6.0 and as a result 0.6.0.1 is going to cover a few corner cases. So far I have not encountered a single regression of 0.6.0, but instead it contained ones for 0.5.33 which did have one that was not easy to fix.

So that went really smooth.

UI rework

The UI needs more work still. Specifically that packages do not automatically include all stuff below them and have to be specified by file path instead of by name, is really annoying to me.

But I had delayed 0.6 for some UI work, and the quirks are to remain some. I will work on these things eventually.

Benchmarks

So I updated the website to state that PyStone is now 312% faster, from a number that was very old. I since then ran it with an updated version for Python3, and it's much less there. That is pretty sad.

I will be looking into that for 0.6.1 release, or I will have to update the wording to provide 2 numbers there, because it seems for Python3 performance with Nuitka it might be misleading.

Something with unicode strings and in-place operations is driving me crazy. Nuitka is apparently slower for that, and I can't point where that is happening exactly. It seems internally unicode objects are maybe put into a different state from some operations, which then making in-place extending in realloc fail more often, but I cannot know yet.

Inplace Operations

So more work has been put into those, adding more specialization, and esp. also applying them for module variables as well. CPython can do that, and actually is giving itself a hard time about it, and Nuitka should be doing this much clever with its more static knowledge.

But I cannot tell you how much scratching my head was wasted debugging that. I was totally stupid about how I approached that, looking from the final solution, it was always easy. Just not for me apparently.

New use cases

Talked about those above. So the top level logging module of your own was working fine in accelerated mode, but for standalone it failed and used the one from standard library instead. That kind of shadowing happened because Nuitka was going from module objects to their names and back to objects, which are bad in case of duplicates. That is fixed for develop, and one of those risk cases, where it cannot be a hotfix because it touched too much.

Then pure Python3 packages need not have __init__.py and so far that was best working for sub-packages, but after 0.6.0.1 hotfix, now it will also work for the main module you compile to be that empty.

Tcl/Tk Standalone

So instructions have been provided how to properly make that work for Python standalone on Windows. I have yet to live up to my promise and make Nuitka automatically include the necessary files. I hope to do it for 0.6.1 though.

Caching Examined

So I am looking at ccache on Linux right now, and found e.g. that it was reporting that gcc --version was called a lot at startup of Scons and then g++ --version once. The later is particularly stupid, because we are not going to use g++ normally, except if gcc is really old and does not support C11. So in case a good one was found, lets disable that version query and not do it.

And for the gcc version output, monkey patching scons to a version of getting that output that caches the result, removes those unnecessary forks.

So ccache is being called less frequently, and actually these --version outputs appears to actually take measurable time. It's not dramatic, but ccache was apparently getting locks, and that's worth avoiding by itself.

That said, the goal is for ccache and clcache to make them both report their effectiveness of cache usage after the end of a test suite run. That way I am hoping to notice and be able to know, if caching is used to its full effect.

Twitter

I continue to be very active there. I put out a poll about the comment system, and disabling Disqus comments as a result, I will focus on Twitter for web site comments too now.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Adding Twitter more prominently to the web site is something that is also going to happen.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Plans

Working on the 0.6.1 release, attacking more in-place add operations as a first goal, and now turning to binary operations, I am trying to shape how using different helper functions to different object types looks like. And to gain performance without C types. But ultimately the same issue will arise there, what to do with mixed input types.

My desire is for in-place operations to fully catch up with CPython, as these can easily loose a lot of performance. Closure variables and their cells are another target to pick on, and I feel they ought to be next after module ones are now working, because also their solution ought to be very similar. Then showing that depending on target storage, local, closure, or module, is then faster in all cases would be a goal for the 0.6.1 release.

This feels not too far away, but we will see. I am considering next weekend for release.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka Release 0.6.0

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release adds massive improvements for optimization and a couple of bug fixes.

It also indicates reaching the mile stone of doing actual type inference, even if only very limited.

And with the new version numbers, lots of UI changes go along. The options to control recursion into modules have all been renamed, some now have different defaults, and finally the filenames output have changed.

Bug Fixes

  • Python3.5: Fix, the awaiting flag was not removed for exceptions thrown into a coroutine, so next time it appeared to be awaiting instead of finished.
  • Python3: Classes in generators that were using built-in functions crashed the compilation with C errors.
  • Some regressions for XML outputs from previous changes were fixed.
  • Fix, hasattr was not raising an exception if used with non-string attributes.
  • For really large compilations, MSVC linker could choke on the input file, line length limits, which is now fixed for the inline copy of Scons.
  • Standalone: Follow changed hidden dependency of PyQt5 to PyQt5.sip for newer versions
  • Standalone: Include certificate file using by requests module in some cases as a data file.

New Optimization

  • Enabled C target type nuitka_bool for variables that are stored with boolean shape only, and generate C code for those
  • Using C target type nuitka_bool many more expressions are now handled better in conditions.
  • Enhanced is and is not to be C source type aware, so they can be much faster for them.
  • Use C target type for bool built-in giving more efficient code for some source values.
  • Annotate the not result to have boolean type shape, allowing for more compile time optimization with it.
  • Restored previously lost optimization of loop break handling StopIteration which makes loops much faster again.
  • Restore lost optimization of subscripts with constant integer values making them faster again.
  • Optimize in-place operations for cases where left, right, or both sides have known type shapes for some values. Initially only a few variants were added, but there is more to come.
  • When adjacent parts of an f-string become known string constants, join them at compile time.
  • When there is only one remaining part in an f-string, use that directly as the result.
  • Optimize empty f-strings directly into empty strings constant during the tree building phase.
  • Added specialized attribute check for use in re-formulations that doesn't expose exceptions.
  • Remove locals sync operation in scopes without local variables, e.g. classes or modules, making exec and the like slightly leaner there.
  • Remove try nodes that did only re-raise exceptions.
  • The del of variables is now driven fully by C types and generates more compatible code.
  • Removed useless double exception exits annotated for expressions of conditions and added code that allows conditions to adapt themselves to the target shape bool during optimization.

New Features

  • Added support for using .egg files in PYTHONPATH, one of the more rare uses, where Nuitka wasn't yet compatible.
  • Output binaries in standalone mode with platform suffix, on non-Windows that means no suffix. In accelerated mode on non-Windows, use .bin as a suffix to avoid collision with files that have no suffix.
  • Windows: It's now possible to use clang-cl.exe for CC with Nuitka as a third compiler on Windows, but it requires an existing MSVC install to be used for resource compilation and linking.
  • Windows: Added support for using ccache.exe and clcache.exe, so that object files can now be cached for re-compilation.
  • For debug mode, report missing in-place helpers. These kinds of reports are to become more universal and are aimed at recognizing missed optimization chances in Nuitka. This features is still in its infancy. Subsequent releases will add more like these.

Organizational

  • Disabled comments on the web site, we are going to use Twitter instead, once the site is migrated to an updated Nikola.
  • The static C code is now formatted with clang-format to make it easier for contributors to understand.
  • Moved the construct runner to top level binary and use it from there, with future changes coming that should make it generally useful outside of Nuitka.
  • Enhanced the issue template to tell people how to get the develop version of Nuitka to try it out.
  • Added documentation for how use the object caching on Windows to the User Manual.
  • Removed the included GUI, originally intended for debugging, but XML outputs are more powerful anyway, and it had been in disrepair for a long time.
  • Removed long deprecated options, e.g. --exe which has long been the default and is no more accepted.
  • Renamed options to include plugin files to --include-plugin-directory and --include-plugin-files for more clarity.
  • Renamed options for recursion control to e.g. --follow-imports to better express what they actually do.
  • Removed --python-version support for switching the version during compilation. This has only worked for very specific circumstances and has been deprecated for a while.
  • Removed --code-gen-no-statement-lines support for not having line numbers updated at run time. This has long been hidden and probably would never gain all that much, while causing a lot of incompatibilty.

Cleanups

  • Moved command line arguments to dedicated module, adding checks was becoming too difficult.
  • Moved rich comparison helpers to a dedicated C file.
  • Dedicated binary and unary node bases for clearer distinction and more efficient memory usage of unuary nodes. Unary operations also no longer have in-place operation as an issue.
  • Major cleanup of variable accesses, split up into multiple phases and all including module variables being performed through C types, with no special cases anymore.
  • Partial cleanups of C type classes with code duplications, there is much more to resolve though.
  • Windows: The way exec was performed is discouraged in the subprocess documentation, so use a variant that cannot block instead.
  • Code proving information about built-in names and values was using not very portable constructs, and is now written in a way that PyPy would also like.

Tests

  • Avoid using 2to3 for basic operators test, removing test of some Python2 only stuff, that is covered elsewhere.
  • Added ability to cache output of CPython when comparing to it. This is to allow CI tests to not execute the same code over and over, just to get the same value to compare with. This is not enabled yet.

Summary

This release marks a point, from which on performance improvements are likely in every coming release. The C target types are a major milestone. More C target types are in the work, e.g. void is coming for expressions that are done, but not used, that is scheduled for the next release.

Although there will be a need to also adapt optimization to take full advantage of it, progress should be quick from here. There is a lot of ground to cover, with more C types to come, and all of them needing specialized helpers. But as soon as e.g. int, str are covered, many more programs are going to benefiting from this.

Nuitka this week #7

Nuitka Design Philosophy

Note

I wrote this as part of a discussion recently, and I think it makes sense to share my take on Nuitka and design. This is a lot text though, feel free to skip forward.

The issue with Nuitka and design mainly for me is that the requirements for many parts were and are largely unknown to me, until I actually start to do it.

My goto generators approach worked out as originally designed, and that felt really cool for once, but the whole "C type" thing was a total unknown to me, until it all magically took form.

But rather I know it will evolve further if I go from "bool" (complete and coming for 0.6.0) via "void" (should be complete already, but enabling will happen only for 0.6.1 likely) to "int", not sure how long that will take.

I really think Nuitka, unlike other software that I have designed, is more of a prototype project that gradually turns more and more into the real thing.

I have literally spent years to inject proper design in steps into the optimization phase, what I call SSA, value tracing, and it is very much there now. I am probably going to spend similar amounts of time, to execute on applying type inference results to the code generation.

So I turned that into something working with code strings to something working with variable declaration objects knowing their type for the goto generators, aiming at C types generally. All the while carrying the full weight of passing every compatibility test there is.

Then e.g. suddenly cleaning up module variables to no longer have their special branch, but a pseudo C type, that makes them like everything else. Great. But when I first introduced the new thing, I postponed that, because I could sooner apply its benefits to some things and get experience from it.

While doing partial solutions, the design sometimes horribly degrades, but only until some features can carry the full weight, and/or have been explored to have their final form.

Making a whole Nuitka design upfront and then executing it, would instead give a very high probability of failing in the real world. I am therefore applying the more agile approach, where I make things work first. And then continue to work while I clean it up.

For every feature I added, I actively go out, and change the thing, that made it hard or even fail. Always. I think Nuitka is largely developed by cleanups and refactoring. Goto generators were a fine example of that, solving many of the issues by injecting variable declarations objects into code generation, made it easy to indicate storage (heap or object or stack) right there.

That is not to say that Nuitka didn't have the typical compiler design. Like parsing inputs, optimizing a tree internally, producing outputs. But that grand top level design only tells you the obvious things really and is stolen anyway from knowing similar projects like gcc.

There always were of course obvious designs for Nuitka, but that really never was what anybody would consider to make a Python compiler hard. But for actual compatibility of CPython, so many details were going to require examination with no solutions known ahead of time.

I guess, I am an extreme programmer, or agile, or however they call it these days. At least for Nuitka. In my professional life, I have designed software for ATC on the drawing board, then in paper, and then in code, the design just worked, and got operational right after completion, which is rare I can tell you.

But maybe that is what keeps me exciting about Nuitka. How I need to go beyond my abilities and stable ground to achieve it.

But the complexity of Nuitka is so dramatically higher than anything I ever did. It is doing a complicated, i.e. detail rich work, and then it also is doing hard jobs where many things have to play together. And the wish to have something working before it is completed, if it ever is, makes things very different from projects I typically did.

So the first version of Nuitka already had a use, and when I publicly showed it first, was capable of handling most complex programs, and the desire was to evolve gradually.

I think I have desribed this elsewhere, but for large parts of the well or bad designed solutions of Nuitka, there is reliable ways of demonstrating it works correctly. Far better than I have ever encountered. i believe it's the main reason I managed to get this off the ground is that. Having a test "oracle" is what makes Nuitka special, i.e. comparing to existing implementations.

Like a calculator can be tested comparing it to one of the many already perfect ones out there. That again makes Nuitka relatively easy despite the many details to get right, there is often an easy way to tell correct from wrong.

So for me, Nuitka is on the design level, something that goes through many iterations, discovery, prototyping, and is actually really exciting in that.

Compilers typically are boring. But for Nuitka that is totally not the case, because Python is not made for it. Well, that*s technically untrue, lets say not for optimizing compilers, not for type inference, etc.

UI rework

Following up on discussion on the mailing list, the user interface of Nuitka will become more clear with --include-* options and --[no]follow-import* options that better express what is going to happen.

Also the default for following with extension modules is now precisely what you say, as going beyond what you intend to deliver makes no sense in the normal case.

Goto Generators

Now release as 0.5.33 and there has been little regressions so far, but the one found is only in the pre-release of 0.6.0 so use that instead if you encounter a C compilation error.

Benchmarks

The performance regressions fixed for 0.6.0 impact pystone by a lot, loops were slower, so were subscripts with constant integer indexes. It is a pity these were introduced in previous releases during refactorings without noticing.

We should strive to have benchmarks with trends. Right now Nuitka speedcenter cannot do it. Focus shoud definitely go to this. Like I said, after 0.6.0 release, this will be a priority, to make them more useful.

Twitter

I continue to be active there. I just put out a poll about the comment system, and disabling Disqus comments I will focus on Twitter for web site comments too now.

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very likely at least one you can help with.

Nuitka definitely needs more people to work on it.

Egg files in PYTHONPATH

This is a relatively old issue that now got addressed. Basically these should be loaded from for compilation. Nuitka now unpacks them to a cache folder so it can read source code from them, so this apparently rare use case works now, yet again improving compatibility.

Will be there for 0.6.0 release.

Certifi

Seems request module sometimes uses that. Nuitka now includes that data file starting with 0.6.0 release.

Compatibility with pkg_resources

It seems that getting "distributions" and taking versions from there, is really a thing, and Nuitka fails pkg_resources requirement checks in standalone mode at least, and that is of course sad.

I am currently researching how to fix that, not sure yet how to do it. But some forms of Python installs are apparently very affected by it. I try looking into its data gathering, maybe compiled modules can be registered there too. It seems to be based on file system scans of its own makings, but there is always a monkey patch possible to make it better.

Plans

Still working on the 0.6.0 release, cleaning up open ends only. Release tests seem to be pretty good looking. The UI changes and stuff are a good time to be done now, but delay things, and there is a bunch of small things that are low hanging fruits while I wait for test results.

But since it fixes so many performance things, it really ought to be out any day now.

Also the in-place operations stuff, I added it to 0.6.0 too, just because it feels very nice, and improves some operations by a lot too. Initially I had made a cut for 0.6.1 already, but that is no more.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka Release 0.5.33

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler. Please see the page "What is Nuitka?" for an overview.

This release contains a bunch of fixes, most of which were previously released as part of hotfixes, and important new optimization for generators.

Bug Fixes

  • Fix, nested functions with local classes using outside function closure variables were not registering their usage, which could lead to errors at C compile time. Fixed in 0.5.32.1 already.
  • Fix, usage of built-in calls in a class level could crash the compiler if a class variable was updated with its result. Fixed in 0.5.32.1 already.
  • Python 3.7: The handling of non-type bases classes was not fully compatible and wrong usages were giving AttributeError instead of TypeError. Fixed in 0.5.32.2 already.
  • Python 3.5: Fix, await expressions didn't annotate their exception exit. Fixed in 0.5.32.2 already.
  • Python3: The enum module usages with __new__ in derived classes were not working, due to our automatic staticmethod decoration. Turns out, that was only needed for Python2 and can be removed, making enum work all the way. Fixed in 0.5.32.3 already.
  • Fix, recursion into __main__ was done and could lead to compiler crashes if the main module was named like that. This is not prevented. Fixed in 0.5.32.3 already.
  • Python3: The name for list contraction's frames was wrong all along and not just changed for 3.7, so drop that version check on it. Fixed in 0.5.32.3 already.
  • Fix, the hashing of code objects has creating a key that could produce more overlaps for the hash than necessary. Using a C1 on line 29 and C on line 129, was considered the same. And that is what actually happened. Fixed in 0.5.32.3 already.
  • MacOS: Various fixes for newer Xcode versions to work as well. Fixed in 0.5.32.4 already.
  • Python3: Fix, the default __annotations__ was the empty dict and could be modified, leading to severe corruption potentially. Fixed in 0.5.32.4 already.
  • Python3: When an exception is thrown into a generator that currently does a yield from is not to be normalized.
  • Python3: Some exception handling cases of yield from were leaking references to objects. Fixed in 0.5.32.5 already.
  • Python3: Nested namespace packages were not working unless the directory continued to exist on disk. Fixed in 0.5.32.5 already.
  • Standalone: Do not include icuuc.dll which is a system DLL. Fixed in 0.5.32.5 already.
  • Standalone: Added hidden dependency of newer version of sip. Fixed in 0.5.32.5 already.
  • Standalone: Do not copy file permissions of DLLs and extension modules as that makes deleting and modifying them only harder. Fixed in 0.5.32.6 already.
  • Windows: The multiprocessing plugin was not always properly patching the run time for all module loads, made it more robust. Fixed in 0.5.32.6 already.
  • Standalone: Do not preserve permissions of copied DLLs, which can cause issues with read-only files on Windows when later trying to overwrite or remove files.
  • Python3.4: Make sure to disconnect finished generators from their frames to avoid potential data corruption. Fixed in 0.5.32.6 already.
  • Python3.5: Make sure to disconnect finished coroutines from their frames to avoid potential data corruption. Fixed in 0.5.32.6 already.
  • Python3.6: Make sure to disconnect finished asyncgen from their frames to avoid potential data corruption. Fixed in 0.5.32.6 already.
  • Python3.5: Explicit frame closes of frames owned by coroutines could corrupt data. Fixed in 0.5.32.7 already.
  • Python3.6: Explicit frame closes of frames owned by asyncgen could corrupt data. Fixed in 0.5.32.7 already.
  • Python 3.4: Fix threaded imports by properly handling _initializing in compiled modules `spec attributes. Before it happen that another thread attempts to use an unfinished module. Fixed in 0.5.32.8 already.
  • Fix, the options --include-module and --include-package were present but not visible in the help output. Fixed in 0.5.32.8 already.
  • Windows: The multiprocessing plugin failed to properly pass compiled functions. Fixed in 0.5.32.8 already.
  • Python3: Fix, optimization for in-place operations on mapping values are not allowed and had to be disabled. Fixed in 0.5.32.8 already.
  • Python 3.5: Fixed exception handling with coroutines and asyncgen throw to not corrupt exception objects.
  • Python 3.7: Added more checks to class creations that were missing for full compatibility.
  • Python3: Smarter hashing of unicode values avoids increased memory usage from cached converted forms in debug mode.

Organizational

  • The issue tracker on Github is now the one that should be used with Nuitka, winning due to easier issue templating and integration with pull requests.
  • Document the threading model and exception model to use for MinGW64.
  • Removed the enum plug-in which is no longer useful after the improvements to the staticmethod handling for Python3.
  • Added Python 3.7 testing for Travis.
  • Make it clear in the documentation that pyenv is not supported.
  • The version output includes more information now, OS and architecture, so issue reports should contain that now.
  • On PyPI we didn't yet indicated Python 3.7 as supported, which it of course is.

New Features

  • Added support for MiniConda Python.

Optimization

  • Using goto based generators that return from execution and resume based on heap storage. This makes tests using generators twice as fast and they no longer use a full C stack of 2MB, but only 1K instead.
  • Conditional a if cond else b, a and b`, a or b expressions of which the result value is are now transformed into conditional statements allowing to apply further optimizations to the right and left side expressions as well.
  • Replace unused function creations with side effects from their default values with just those, removing more unused code.
  • Put all statement related code and declarations for it in a dedicated C block, making things slightly more easy for the C compiler to re-use the stack space.
  • Avoid linking against libpython in module mode on everything but Windows where it is really needed. No longer check for static Python, not needed anymore.
  • More compact function, generator, and asyncgen creation code for the normal cases, avoid qualname if identical to name for all of them.
  • Python2 class dictionaries are now indeed directly optimized, giving more compact code.
  • Module exception exits and thus its frames have become optional allowing to avoid some code for some special modules.
  • Uncompiled generator integration was backported to 3.4 as well, improving compatibility and speed there as well.

Cleanups

  • Frame object and their cache declarations are now handled by the way of allocated variable descriptions, avoid special handling for them.
  • The interface to "forget" a temporary variable has been replaced with a new method that skips a number for it. This is done to keep expression use the same indexes for all their child expressions, but this is more explicit.
  • Instead of passing around C variables names for temporary values, we now have full descriptions, with C type, code name, storage location, and the init value to use. This makes the information more immediately available where it is needed.
  • Variable declarations are now created when needed and stored in dedicated variable storage objects, which then in can generate the code as necessary.
  • Module code generation has been enhanced to be closer to the pattern used by functions, generators, etc.
  • There is now only one spot that creates variable declaration, instead of previous code duplications.
  • Code objects are now attached to functions, generators, coroutines, and asyncgen bodies, and not anymore to the creation of these objects. This allows for simpler code generation.
  • Removed fiber implementations, no more needed.

Tests

  • Finally the asyncgen tests can be enabled in the CPython 3.6 test suite as the corrupting crasher has been identified.
  • Cover ever more cases of spurious permission problems on Windows.
  • Added the ability to specify specific modules a comparison test should recurse to, making some CPython tests follow into modules where actual test code lives.

Summary

This release is huge in many ways.

First, finishing "goto generators" clears an old scalability problem of Nuitka that needed to be addressed. No more do generators/coroutines/asyncgen consume too much memory, but instead they become as lightweight as they ought to be.

Second, the use of variable declarations carying type information all through the code generation, is an important pre-condition for "C types" work to resume and become possible, what will be 0.6.0 and the next release.

Third, the improved generator performance will be removing a lot of cases, where Nuitka wasn't as fast, as its current state not using "C types" yet, should allow. It is now consistenly faster than CPython for everything related to generators.

Fourth, the fibers were a burden for the debugging and linking of Nuitka on various platforms, as they provided deprecated interfaces or not. As they are now gone, Nuitka ought to definitely work on any platform where Python works.

From here on, C types work can take it, and produce the results we are waiting for in the next major release cycle that is about to start.

Also the amount of fixes for this release has been incredibly high. Lots of old bugs esp. for coroutines and asyncgen have been fixed, this is not only faster, but way more correct. Mainly due to the easier debugging and interface to the context code, bugs were far easier to avoid and/or find.

Nuitka this week #6

Holiday

In my 2 weeks holiday, I indeed focused on a really big thing, and got more done that I had hoped for. For C types, nuitka_bool, which is a tri-state boolean with true, false and unassigned, can be used for some variables, and executes some operations without going through objects anymore.

bool

Condition codes are no longer special. They all need a boolean value from the expression used as a condition, and there was a special paths for some popular expressions for conditions, but of course not all. That is now a universal thing, conditional statement/expressions will now simply ask to provide a temp variable of value nuitka_bool and then code generation handles it.

For where it is used, code gets a lot lighter, and of course faster, although I didn't measure it yet. Going to Py_True/Py_False and comparing with it, wasn't that optimal, and it's nice this is now so much cleaner as a side effect of that C bool work.

This seems to be so good, that actually it's the default for this to be used in 0.6.0, and that itself is a major break through. Not so much for actual performance, but for structure. Other C types are going to follow soon and will give massive performance gains.

void

And what was really good, is that not only did I get bool to work almost perfectly, I also started work on the void C target type and finished that after my return from holiday last weekend, which lead to new optimization that I am putting in the 0.5.33 release that is coming soon, even before the void code generation is out.

The void C type cannot read values back, and unused values should not be used, so this gives errors for cases where that becomes obvious.

a or b

Consider this expression. The or expression, that one is going to producing a value, which is then released, but not used otherwise. New optimzation creates a conditional statement out of it, which takes a as the condition and if not true, then evaluates b but ignores it.

if not a:
   b

The void evaluation of b can then do further optimization for it.

Void code generation can therefore highlight missed opportunities for this kid of optimization, and found a couple of these. That is why I was going for it, and I feel it pays off. Code generation checking optimization here, is a really nice synergy between the two.

Plus I got all the tests to work with it, and solved the missing optimizations it found very easily. And instead of allocating an object now, not assigning is often creating more obvious code. And that too allowed me to find a couple of bugs by C compiler warnings.

Obviously I will want to run a compile all the world test before making it the default, which is why this will probably become part of 0.6.1 to be the default.

module_var

Previously variable codes were making a hard distinction for module variables and make them use their own helper codes. Now this is encapsulated in a normal C type class like nuitka_bool, or the one for PyObject * variables, and integrates smoothly, and even got better. A sign things are going smooth.

Goto Generators

Still not released. I delayed it after my holiday, and due to the heap generator change, after stabilizing the C types work, I want to first finish a tests/library/compile_python_module.py resume run, which will for a Anaconda3 compile all the code found in there.

Right now it's still doing that, and even found a few bugs. The heap storage can still cause issues, as can changes to cloning nodes, which happens for try nodes and their finally blocks.

This should finish these days. I looked at performance numbers and found that develop is indeed only faster, and factory due to even more optimization will be yet faster, and often noteworthy.

Benchmarks

The Speedcenter of Nuitka is what I use right now, but it's only showing the state of 3 branches and compared to CPython, not as much historical information. Also the organization of tests is poor. At least there is tags for what improved.

After release of Nuitka 0.6.0 I will show more numbers, and I will start to focus on making it easier to understand. Therefore no link right now, google if you are so keen. ;-)

Twitter

During the holiday sprint, and even after, I am going to Tweet a lot about what is going on for Nuitka. So follow me on twitter if you like, I will post important stuff as it happens there:

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Poll on Executable Names

So I put e.g. poll up on Twitter, which is now over. But it made me implement a new scheme, due to popular consensus

Hotfixes

Even more hotfixes. I even did 2 during my holiday, however packages built only later.

Threaded imports on 3.4 or higher of modules were not using the locking they should use. Multiprocessing on Windows with Python3 had even more problems, and the --include-package and --include-module were present, but not working.

That last one was actually very strange. I had added a new option group for them, but not added it to the parser. Result: Option works. Just does not show up in help output. Really?

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very like one you can help with.

Nuitka definitely needs more people to work on it.

Plans

Working down the release backlog. Things should be out. I am already working on what should become 0.6.1, but it's not yet 0.5.33 released. Not a big deal, but 0.6.0 has 2 really important fixes for performance regressions that have happened in the past. One is for loops, making that faster is probably like the most important one. The other for constant indexing, probably also very important. Very much measurable in pystone at least.

In the mean time, I am preparing to get int working as a target C type, so e.g. comparisons of such values could be done in pure C, or relatively pure C.

Also, I noticed that e.g. in-place operations can be way more optimized and did stuff for 0.6.1 already in this domain. That is unrelated to C type work, but kind of follows a similar route maybe. How to compare mixed types we know of, or one type only. That kind of things needs ideas and experiments.

Having int supported should help getting some functions to C speeds, or at least much closer to it. That will make noticable effects in many of the benchmarks. More C types will then follow one by one.

Donations

If you want to help, but cannot spend the time, please consider to donate to Nuitka, and go here:

Donate to Nuitka

Nuitka this week #5

Goto Generators

Finished. Done. Finally.

Bechmarking was exciting. One program benchmark I had run in the past, was twice as fast than before, showing that the new implementation is indeed much faster, which is fantastic news.

Creating generator expressions and using them both got substantially faster and that is great.

It took me a fair amount of time to debug coroutines and asyncgen based on the new goto implementation. But the result is really good, and a fair amount of old bugs have been fixed. There always had been a segfault with asyncgen test that now has been eradicated.

One major observation is now, with only one C stack, debugging got a lot easier before, where context switches left much of the program state not reachable.

Benchmarks

Posted this one Twitter already:

Nuitka Speedcenter Builtin sum with generator

That one construct test has been a problem child, where Nuitka was slower than CPython 2.x, and very little faster than 3.x, and now with goto generators finally has become consistently faster.

I will explain what you see there in the next issue. The short version is that there is code, in which for one run, one line is used, and in another the other line is used, and then the "construct" is measure that way, by making the delta of the two. That construct performance is then compared between Python and Nuitka.

So if e.g. Nuitka is already better at looping, that won't influence the number of making that sum call with a generator expression.

The alternative line uses the generator expression, to make sure the construction time is not counted. To measure that, there is another construct test, that just creates it.

Nuitka Speedcenter Generator Expression Creation

This one shows that stable Nuitka was already faster at creating them, but that the develop version got even faster again. As creating generator objects became more lightweight, that is also news.

There are constructs for many parts of Python, to shed a light on how Nuitka fares for that particular one.

Holiday

In my 2 weeks holiday, I will try and focus on the next big thing, C types, something also started in the past, and where recent changes as part of the heap storage, should make it really a lot easier to get it finished. In fact I don't know right now, why my bool experimental work shouldn't just prove to be workable.

I am not going to post a TWN issue next week, mostly because my home servers won't be running, and the static site is rendered on one of them. Of course that would be movable, but I won't bother.

I am going to post a lot on Twitter though.

Static Compilation

There is a Github issue where I describe how pyenv on MacOS ought to be possible to use, and indeed, a brave soul has confirmed and even provided the concrete commands. All it takes now is somebody to fit this into the existing caching mechanism of Nuitka and to make sure the static library is properly patched to work with these commands.

Now is anyone of you going to create the code that will solve it for good?

Twitter

Follow me on twitter if you like, I will post important stuff as it happens there:

Follow @kayhayen

And lets not forget, having followers make me happy. So do re-tweets.

Hotfixes

And there have been yet again more hotfixes. Some are about coroutine and asyncgen corruptions for closes of frames. Multiprocessing plugin on Windows will work in all cases now.

Noteworthy was that the "0.5.32.6" was having a git merge problem on the cherry-pick that git didn't tell me about, leading to crashes. That made it necessary to push an update right after. I was confused that I didn't get a conflict, because there was one. But I am to blame for not checking the actual diff.

Bug Tracker

The next release will make Github the official tracker for Nuitka issues. I am working down the issues on the old tracker. The web site already pointed users there for a while, and I was set on this for some time, but yesterday I focused on taking action.

Basically what won me over is the easier templating of issues and pull requests that would have been possible with Roundup, but never happened. Also the OpenID integration that bugs.python.org has, never became available to me in a ready usable form.

Issue Backlog

Finishing goto "generators allowed" for around 10 issues to be closed alone, and I went over things, and checked out some stale issues, to see if they are dealt with, or pinging authors. I spent like half a day on this, bring down the issue count by a lot. Tedious work, but must be done too.

Also my inbox got a fair amount of cleanup, lots of issues pile up there, and from time to time, I do this, to get things straight. I raised issues for 2 things, that I won't be doing immediately.

But actually as issues go, there really very little problematic stuff open right now, and nothing important really. I would almost call it issue clean.

Help Wanted

If you are interested, I am tagging issues help wanted and there is a bunch, and very like one you can help with.

Nuitka definitely needs more people to work on it.

Plans

The goto generator work could be released, but I want to make the compile all the world test before I do so. It is running right now, but I will not complete before I leave. Also I do not want to get regression reports in my holiday, and goto generators along with heap storage, mean there could be some.

I am going to work on C types now. There is a few closing down actions on what I observed doing goto generators. There are a few easy ways to get even slightly better performance, definitely smaller code out of generators. Not sure if I go there first, or for the C types work directly. I often like to get these kind of observations dealt with more immediately, but I don't want to spend too much quality time on it.

Donations

As I have been asked this, yes, you can donate to Nuitka if you wish to further its development. Go here:

Donate to Nuitka

Nuitka this week #4

Goto Generators

This continues TWN #3 where I explained what is all about.

Good news is, at the time Python2 generators were largely working with the new ways, in the mean time not only did all of the Python 2.7 test suite pass with goto generators, also did the Python 3.4 test suite, i.e. also the yield from is working with it.

The way it was done is to set m_yieldfrom in generators, and then to enter a state, where the code will only be resumed, when that sub-generator that currently it is yielding from, is finished. That makes it very much like normal yield. In fact, code generation is hardly different there.

Since the whole purpose is to get rid of make/get/setcontext, the next stop is coroutines. They have async for, async with and await but at the end of the day, the implementation comes down to yield from really with only a lot of sugar applied.

Right now, I am debugging "goto coroutines". It's hard to tell when it will be finished, and then asyncgen will be waiting still.

This is easily the largest change in a long time, esp. due to the heap storage changes that I already discussed. One this is finished, I expect to turn towards C types with relative ease.

Tox Plugin

Anthony Shaw took on Tox and Nuitka and created a plugin that allows using Nuitka. I am still wrapping my head around these things. It's only a proof of concept yet. I will give it more coverage in the future.

Twitter

Follow me on twitter if you like, I will:

Follow @kayhayen

Hotfixes

So there have even more hotfixes. One addresses memory leaks found with the yield from while I was adding tests. Usually if I encounter an old issue that has a small fix, that is what I do, push out a hotfix using the git flow model. Also nested namespace packages for Python3, those are the ones without a __init__.py were not working after the original directory was removed, and that got fixed.

And right now, I have hotfixes for frames close method, which apparently was never updated to work properly for coroutines and asyncgen. That is going to be in the next hotfix.

Plans

So the heap storage seems pretty complete now, and goto generators are on the final stretch. As always, things feel right around the corner. But it's unclear how much longer I will have to debug. I am pretty sure the bare work of doing asyncgen is going to be low. Debugging that too then, that is the hard part.

A new release seems justified, but I kind of do not want to make it without that major new code used. Because apparently during the debugging, I tend to find issues that need hotfixes, so I will wait for the goto generator work to finish.