02 October 2014
Nuitka shaping up
Not much has happened publicly to Nuitka, so it’s time to make a kind of status post, about the exciting news there is. For a long, long time already, each release of Nuitka has worked towards enabling “SSA” usage in Nuitka. There is a component called “constraint collection”, which is tasked with driving the optimization, and collecting variable traces. Based on these traces, optimizations could be made. Having SSA or not, is (to me) the difference between Nuitka as a compiler, and Nuitka as an optimizing compiler. The news is, SSA is shaping up, and will be used in the next release. Not yet to drive variable based optimization (reserved for a release after it), but to aid the code generation to avoid useless checks. Previously, under the title “C-ish”, Nuitka moved away from C++ based code generation to less C++ based code generated, and more C-ish code. This trend continues, and has lead to removing even more code cleanups. The more important change is from the SSA derived knowledge. Now Nuitka knows that a variable must be assigned, cannot be assigned, may be assigned, based on its SSA traces. Lets check out an example: Nevermind, that obviously the variable This is code as generated now, with current 0.5.5pre5: There are some things, wrong with it still. For one, As you can see, the assignment to So, that is the major change to code generation. The SSA information is now used in it, and doing so, has found a bunch of issues, in how it is built, in e.g. nested branches, that kind of stuff. The removal of local variables as C++ classes, and them managed as temporary variables, is going to happen in a future release, reducing code complexity further. Were The scalability of Nuitka hinges much of generated code size. With it being less stupid, the generated code is now not only faster, but definitely smaller, and with more optimization, it will only become more practical. A recent change in CPython 2.7.8+ which is supposed to become 2.7.9 one day, highlighted an issue with The above two are supposed to be identical. So far this was rectified at run time of CPython, but apparently the parser is now tasked with it, so Nuitka now sees However, as it stands, Nuitka handles I have been working on an enhanced re-formulation (it needs to be tracked if the value was Anyway, consider Incidentally, that Currently, Nuitka is delaying the building of the dictionary (absent So that area now hinges on code generation to learn different local variable codes for classes, centered around the notion of using the locals dictionary immediately. The next release is no longer warning you if you use Python3.4, as many of the remaining problems have been sorted out. Many small things were found, and in some cases these highlighted general Python3 problems. Nuitka for Python3 is not yet all that much in the focus in terms of performance, but correctness will have become much better, with most prominently, exception context being now correct most often. The main focus of Nuitka is Python2, but to Nuitka the incompatibility of Python3 is largely not all that much an issue. The re-formulations to lower level operations for just about everything means that for the largest part there is not much trouble in supporting a mostly only slightly different version of Python. The gain is mostly in that new tests are added in new releases, and these sometimes find things that affect Nuitka in all versions, or at least some others. And this could be a mere reference leak. Consider this: So, that is working with Python2, but comes from a Python3 test. Python2 is supposed to unwrap the tuple and take the first argument and raise that. It didn’t do that so far. Granted, obscure feature, but still an incompatibility. For Python3, a Turned out, that also, in that case, a reference leak occurs, in that the wrong exception was not released, and therefore memory leaked. Should that happen a lot during a programs live, it will potentially become an issue, as it keeps frames on the traceback also alive. So this lead to a compatibility fix and a reference leak fix. And it was found by the Python3.4 suite, checking that exception objects are properly released, and that the proper kind of exception is raised in the no longer supported case. I had been working on automated performance graphs, and they are supposed to show up on Nuitka Speedcenter already, but currently it’s broken and outdated. Sad state of affairs. Reasons include that I found it too ugly to publish unless updated to latest Nikola, for which I didn’t take the time. I intend to fix it, potentially before the release though. Consider the following code: If With SSA in place, and “C-ish” code generation complete, this will be solved, but I am not going to solve this before. The standalone mode of Nuitka is pretty good, and in the pre-release it was again improved. For instance, virtualenv and standalone should work now, and more modules are supported. However, there are known issues with This has as usual only so much priority from me. I am working on this on some occasions, as kind of interesting puzzles to solve. Most of the time, it just works though, with The standalone compilation exhibits scalability problems of Nuitka the most, and while it has been getting better, the recent and future improvements will lead to smaller code, which in turn means not only smaller executables, but also faster compilation. Again, Nuitka doesn’t receive enough donations. There is no support from organizations like e.g. the PSF, which recently backed several projects by doubling donations given to them. I remember talking to a PSF board member during Europython 2013 about this, and the reaction was fully in line with the Europython 2012 feedback towards me from the dictator. They wouldn’t help Nuitka in any way before it is successful. I have never officially applied for help with funding though with them. I am going to choose to take pride in that, I suppose. My quest to find collaborators to Nuitka is largely failing. Aside from the standalone mode, there have been too little contributions. Hope is that it will change in the future, once the significant speed gains arrive. And it might be my fault for not asking for help more, and to arrange myself with that state of things. Not being endorsed by the Python establishment is clearly limiting the visibility of the project. Anyway, things are coming along nicely. When I started out, I was fully aware that the project is something that I can do on my own if necessary, and that has not changed. Things are going slower than necessary though, but that’s probably very typical. But you can join now, just follow this link or become part of the mailing list (since closed) and help me there with request I make, e.g. review posts of mine, test out things, pick up small jobs, answer questions of newcomers, you know the drill probably. So, there is multiple things going on: More “C-ish” code generation The next release is going to be more “C-ish” than before, generating less complex code than before, and removes the previous optimizations, which were a lot of code, to e.g. detect parameter variables without This prong of action will have to continue, as it unblocks further changes that lead to more compatibility and correctness. More SSA usage The next release did and will find bugs in the SSA tracing of Nuitka. It is on purpose only using it, to add We better know that SSA is flawless in its tracking, before we use it to make optimizations, which then have no chance to assert anything at all anymore. Once we take it to that next level, Nuitka will be able to speed up some things by more than the factor it basically has provided for 2 years now, and it’s probably going to happen this year. More compatibility The new That means faster When these 3 things come to term, Nuitka will be a huge, huge step ahead towards being truly a static optimizing compiler (so far it is mostly only peep hole optimization, and byte code avoidance). I still think of this as happening this year.SSA (Single State Assignment Form)
Improved Code Generation
def f():
a = 1
return a
a
can be removed, and this could be transformed to statically return 1
. That is the next step (and easy if SSA is working properly), now we are looking at what changed now.tmp_assign_source_1 = const_int_pos_1;
assert( var_a.object == NULL );
var_a.object = INCREASE_REFCOUNT( tmp_assign_source_1 );
tmp_return_value = var_a.object;
Py_INCREF( tmp_return_value );
goto function_return_exit;
var_a
is still a C++ object, which we directly access. But the good thing is, we can assert that it starts out uninitialized, before we overwrite it. The stable release as of now, 0.5.4, generates code like this:tmp_assign_source_1 = const_int_pos_1;
if (var_a.object == NULL)
{
var_a.object = INCREASE_REFCOUNT( tmp_assign_source_1 );
}
else
{
PyObject *old = var_a.object;
var_a.object = INCREASE_REFCOUNT( tmp_assign_source_1 );
Py_DECREF( old );
}
static PyFrameObject *cache_frame_function = NULL;
MAKE_OR_REUSE_FRAME( cache_frame_function, codeobj_4e03e5698a52dd694c5c263550d71551, module___main__ );
PyFrameObject *frame_function = cache_frame_function;
// Push the new frame as the currently active one.
pushFrameStack( frame_function );
// Mark the frame object as in use, ref count 1 will be up for reuse.
Py_INCREF( frame_function );
assert( Py_REFCNT( frame_function ) == 2 ); // Frame stack
// Framed code:
tmp_return_value = var_a.object;
if ( tmp_return_value == NULL )
{
exception_type = INCREASE_REFCOUNT( PyExc_UnboundLocalError );
exception_value = UNSTREAM_STRING( &constant_bin[ 0 ], 47, 0 );
exception_tb = NULL;
frame_function->f_lineno = 4;
goto frame_exception_exit_1;
}
Py_INCREF( tmp_return_value );
goto frame_return_exit_1;
var_a.object
was checking if it were NULL
, and if were not (which we now statically know), would release the old value. Next up, before returning, the value of var_a.object
needed to be checked, if it were NULL
, in which case, we would need to create a Python exception, and in order to do so, we need to create a frame object, that even if cached, consumes time, and code size.a
a temporary variable, already, the Py_INCREF
which implies a later Py_DECREF
on the constant 1
could be totally avoided.Scalability
Compatibility
Python2 exec statements
exec
statements in Nuitka. These were considered to be fully compatible, but apparently are not totally.def f():
exec a in b, c
exec (a, b, c)
exec a in b, c
for both lines. Which is good.exec
in locals()
the same as exec
in None
for plain functions (OK to classes and modules), which is totally a bug.None
, and then the sync back to locals from the provided dictionary ought to be done. But the change breaks execfile
in classes, which was implemented piggy-backing on exec
, and now requires locals to be a dictionary, and immediately written to.exec
as well working already. The non-working cases are really corner cases, obviously nobody came across so far.Python3 classes
execfile
issue will be solved as soon as a bug is fixed, that was exposed by new abilities of Python3 metaclasses. They were first observed in Python3.4 enum classes.class MyEnum(enum):
red = 1
blue = 2
red = 3 # error
execfile
built-in), and that is not allowed, in fact, immediate writes to the mapping giving by __prepare__
of the metaclass will be required, in which case, the enum
class can raise an error for the second assignment to red
.Python3.4
try:
raise (TypeError, ValueError)
except TypeError:
pass
TypeError
should be raised complaining that tuple
is not derived from BaseException
.Performance
Graphs and Benchmarks
Incremental Assignments
a += "bbb"
a
is a str
, and if (and only if), it’s the only reference being held, then CPython, reuses the object, instead of creating a new object and copying a
over. Well, Nuitka doesn’t do this. This is despite the problem being known for quite some time.Standalone
win32com
and a few other packages, which need to be debugged. Mostly these are modules doing nasty things that make Nuitka not automatically detect imports.wxpython
being the most notable exception. I am going to work on that though.wxpython
is a major offender there, due to its many constants, global variables, etc. in the bindings, while Qt, PySide, and GTK are apparently already good.Other Stuff
Funding
Collaborators
Future
del
statements.assert
statements to things it now no longer does. These will trigger in tests or cause crashes, which then can be fixed.exec
code makes the dictionary synchronization explicit, and e.g. now it is optimized away to even check for its need, if we are in a module or a class, or if it can be known.exec
, but more importantly, a better understood exec
, with improved ability to do SSA
traces for them. Being able to in-line them, or to know the limit of their impact, as it will help to know more invariants for that code.