So, the 0.6.3 release (btw. on Windows, be sure to use the 0.6.3.1
hotfix), which was made as a consolidation effort to get the good work
of mostly other people out, didn't contain much optimization work
for the core, as that is still my thing.
However, this changed a lot. An idea that came to my mind for how to do the
massive amounts of specialized helpers needed beyond + and +=, with which
I had started for prior releases. And that is to use Jinja2 based templates
for C, to generate the code.
This was an important idea. Took a while, but soon the manual code for
already replaced with generated code, fixing a few bugs by the way, and from
there, the generation was expanded to cover
* as well.
Currently, support for the 3 (!) different kinds of additions (
FloorDir as well as Python2 default division, dubbed
Nuitka was added along with
* were done first, is that they have special treatment
for sequences, using
sq_repeat, where the other operations
will be more straightforward, e.g.
-) has a lot types supporting
it and that makes those the easy cases.
I am saving a deeper explanation of 3 things we will need for the next
time. Basically we need optimization of these things at compile time,
and that is getting there, and code to use in the backend, and that is
getting there, and a third thing, that is to use optimization knowledge
to apply the special code as much as possible, and that is not yet fully
This is going to excite Windows users. After Orsiris de Jong had done a
replacement for dependency walker that is faster, this had remained in
an experimental status, just due to lack of time.
Recently however, I felt there is more time, after GSoC student selection
has happened, and that I could finally work a bit on open issues like
this. And when I wrote a dedicated tool, to analyse dependences with
either technology to compare the results, I found that dendency walker
founds a lot more things.
That was a turn down, but turns out, nothing it finds is stuff that should
not be on the white list. In fact, it's all core Windows things, and from
System32 folder. That made me question, why we take anything from
there (except maybe
PythonXY.dll) at all, and after that change the
performance changed dramatically.
The dependency walker now finishes a file in milliseconds. Actually the
pefile is now slow (surely it ought to be compiled), and takes some
seconds, for a file. That is amazing, and has lead to me to remove the
parallel usage, and since
pefile allows for perfect caching, and is
Free Software, we will probably keep it.
This will address a widespread complaint of many Windows users of the
standalone mode. This is now a relatively unnoticable part of the
Currently I need to finish off some remaining problems with it, before
putting it out in the wild. Getting this into a release will solve many