05 February 2023
Nuitka Release 1.4
This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, “download now”.
This release contains a large amount of performance work, where specifically Python versions 3.7 or higher see regressions in relative performance to CPython fixed. Many cases of macros turned to functions have been found and resolved. For 3.10 specifically we take advantage of new opportunities for optimization. And generally avoiding DLL calls will benefit execution times on platform where the Python DLL is used, most prominently Windows.
Then this also adds new features, specifically custom reports. Also tools to aid with adding Nuitka Package Configuration input data, to list DLLs and data files.
With multidist we see a brand new ability to combine several programs into one, that will become very useful for packaging multiple binaries without the overhead of multiple distributions. Standalone: Added implicit dependencies for Fix, the generated metadata nodes for distribution queries had an error in their generated children handling that could cause crashes at compile time. Fixed in 1.3.2 already. Standalone: Added implicit dependencies for Windows: Fix, our shortcut to find DLLs by analyzing loaded DLLs stumbled in a case of a DLL loaded into the compiling Python that had no filename associated, while strange, we need to handle this as well. Fixed in 1.3.3 already. Standalone: Also need to workaround more decorator tricks for Scons: Fix, was not updating Standalone: Added support for Standalone: Added support for newer Standalone: Fix, some extension modules set their Python3.10+: Fix, was not supporting Windows: Identify Windows ARM architecture Python properly. We do not yet support it, but we should report it properly and some package configurations are already taking it already into account. Fixed in 1.3.5 already. Fix, the Nuitka meta path based loader, needs to expose a Fix, very old versions of the Fix, value escapes for attribute and subscript assignments sources were not properly annotated. This could cause incorrect code execution. Fixed in 1.3.6 already. Fix, “pure” functions, which are currently only our complex call helper functions, were not visited in all cases. This lead to a crash in code generation after modules using them got demoted to bytecode. After use from cache, this didn’t happen again. Fixed in 1.3.6 already. Standalone: Added more implicit dependencies of crypto packages. Fixed in 1.3.6 already. Standalone: Added implicit dependencies of Fix, was falsely encoding Standalone: Fix, Python3.8+: Fix, need to follow change for extension module handling, otherwise some uses of Standalone: Added missing implicit dependencies of Python3.9+: Fix, resource reader files was not fully compatible and needed to register with Fix, the version check for Standalone: Added DLLs needed for Fix, imports of resolved names should be modified for runtime too. Where Nuitka recognizes aliases, as e.g. the Onefile: Fix, do not send duplicate CTRL-C to child process. Our test only send it to the bootstrap process, rather than the process group, as it normally is working, therefore misleading us into sending it to the child even if not needed. Onefile: When not using cached mode, on Windows the temporary folder used sometimes failed to delete after the executable stopped with CTRL-C. This is due to races in releasing of locks and process termination and AV tools, so we now retry for some time, to make sure it is always deleted. Standalone: Fix, was not ignoring Standalone: Added missing implicit dependency of Standalone: The main binary name on non-Windows didn’t have a suffix macOS: Workaround bug in Standalone: Added DLLs of Fix, for package metadata as from Standalone: Added support for UI: Added new option to listing package data files. This is for use with analyzing standalone issues. And will output all files that are data files for a given package name. UI: Added new option to listing package DLL files. This is also for use with analyzing standalone issues. Reports: The usages of modules, successful or not, are now included in the compilation report. Checking out which ones are Multidist: You can now experimentally create binaries with multiple entry points. At runtime one of multiple Note Using it with only one replaces the previous use of the positional argument given and is not using multidist at all. Note Multidist is compatible with onefile, standalone, and mere acceleration. It cannot be used for module mode obviously. For deployment this can solve duplication. Note For wheels, we will probably change those with multiple entry points to compiling multidist executables, so we do avoid Python script entry points there. But this has not yet been done. Onefile: Kill non-cooperating child processes on CTRL-C after a grace period, that can be controlled at compile time with Plugins: Add support for extra global search paths to mimic Reports: Include used distributions of compiled packages and their versions. Reports: Added ability to generate custom reports with Anti-Bloat: Added support for checking python flags. There are Not all packages will work in these modes, but often can be enhanced to work with relatively little patching. This allows to limit these patches to only where they are necessary. Anti-Bloat: Avoid using Anti-Bloat: Avoid IPython and Numba in Anti-Bloat: Avoid Numba in Anti-bloat: Removed For guided deep copy This also speeds up copies of non-empty tuples by avoiding that size checking branch in construction with Python 3.10 or higher. For node construction, avoid keyword argument style calls of the base class, where there is only a single argument. They don’t really help readability, but cost compile time. Determine guard mode of frames dynamically and avoid frame preservation checks where they are not needed. For Python2 this is necessary, but not for Python3, so make the function avoid finding the parent frame for that version entirely, which should speed up compilation as well. By not hard coding frame guard mode at creation time, and instead determine it at compile time, after optimization, so this now allows to use the “once” mode more often. This affects contractions and also classes on the module level right now. They do not need a cached frame, since their code is only executed once. By avoiding that useless code, the C compiler also has a slightly better scalability, since the classes are all created in one function that then has less code. The bytecode cache is now checking if the used modules or attempted to be used modules are available or not in just the same way. Previously it was very dependent on the file system to contain the same things, which was not giving cache hits even after only creating a new folder near a binary, since that affected importable modules. With the new check it should be much more directly hitting even across different virtual environments, but with same code. Generate base classes or mixins for all kinds of expression, statements and statement sequences. The previous code had a dedicated variant for single child, to allow faster operation in a common case, but still a lot of Another shortcoming was that for some nodes, some values are optional, where for others, they are not. Some values are a The new code now enforces children types during creation and updated, it rejects unexpected Also when abstract executing the tree, rather than generically visiting all children, this now just unrolls this, and there are even some modes added, where a node can indicate properties, e.g. Also the way These changes brought about much faster compilation, however the big elephant in the room will still be merging value traces, and scalability problems remain there. Attribute node generation for method specs like Determining the used modules of a module requires a tree visit operations, that then asked for node types and used different APIs. This has been unified to be able to call a virtual method instead, which saves some compile time. After scanning for a module, we then determined the module kind even after we previously knew it during the scan. Also, this was checking Already most helper code of Nuitka was included from Cache the plugin decisions about recursion for a module name. When a module is imported multiple times plugins were each asked again and again, which is not a good thing to do. Avoid usage of Python3.8+: Avoid usage of Replace Python3.7+: Avoid the use of Avoid using Faster exception match checks and sub type checks. This solves a Faster subtype checks. These are common in binary operations on non-identical types, but also needed for the exception checks, and object creation through class type calls. With our own Faster Python3 On Python 3.9 or higher we can get small int values directly from the interpreter, and with 3.11 they are accessible as global values. Also we no longer de-duplicate small int values through our cache, since there is no use in this, saving a bunch of startup time. And we can create the values with our own API replacement, that will work during startup already and save API calls as these can be relatively slow. And esp. for the small values, this benefits from not having to create them. Faster Python3 On Python 3.10 or higher, we can create these values ourselves without an API call, avoiding its overhead. Also we no longer de-duplicate small bytes values through our cache, because that is already done by the API and our replacement, so this was just wasting time. Faster On Python 3.10 or higher, we can create these values ourselves without an API call, avoiding its overhead. These are important for Python3, because Faster For common cases, this avoids API calls. We mostly have this such that Faster exception normalization. For the common case, we have our own variant of Faster object releases For Python3.8 or higher when our code released objects, it was doing that with an API call, due to a macro change in Python headers. We revert that and do it still on our own which avoids the performance penalty. Enable Python threading during extension module DLL loading We now release the GIL for Python3.8 or higher when loading the DLL, following a change in that version. Faster variable handling in trace collection. The code was doing checks for variable types, to decide what to do e.g. when control flow escapes for a variable. However, this is faster if solved with a virtual method in those variable classes, shifting the responsibility to inside there. For call codes the need to check the return value was not perfectly annotated in all cases. This is now driven by the expression rather than passed, and will result in better code generated in some corner cases. Release: Make clear we require Debugging: When plugins evaluate Anti-Bloat: Added a mnemonic and more clear message for the case of unwanted imports being encountered. Also do not warn about IPython itself using IPython packages, that must of course be considered normal. Now it also lists the module that does the unwanted usage immediately. Previously this was not as clear. UI: More clear output for not yet supported Python version. Make it more clear in the message, what is the highest supported version, and what version is Nuitka and what is Python in this. UI: Make sure data files have normalized paths. Specifically on Windows, otherwise a mix of slashes could appear. Part of 1.3.6 release already. UI: Make it clear that disabling the console harms your debugging when we suggest the UI: The ordering of scons Quality: Use proper temporary filename during autoformat, so as to avoid flicker in Visual Code, e.g. search results. User Manual: Was still using old option name for Standalone: Do not include data files scanned with Onefile: Added check for compression ability before starting the compilation to inform the user immediately. Release: Mark macOS as supported in PyPI categories. This is of course true for a long time already. Release: Mark Android as supported in PyPI categories as well. With some extra work, it can be used. User Manual: Added section pointing to and explaining compilation reports. This has become extremely useful even if still somewhat work in progress. User Manual: Added table with included custom reports, at this time only the license reports, which is very rough shape and needs contributors for good looks and content. Plugins: Moved parts of the Made the user query code a dedicated function, so it can be reused and more consistent across its uses in Nuitka. With a default that is proposed to a user, and a default that applies if used non-interactively. We will switch all prompts to using this. Code generation for module, class and function frames is now unified, removing duplication while also becoming more flexible. For generators this work has been started, but is not yet completed. Nodes exposing used modules now implement the same virtual method providing a list of them. Make sure to pass Do not enable deprecated plugins, the warnings about them break tests. Ignore Qt binding warnings in tests, some are less supported than The focus of this release was first a major restructuring of how children are handled in the node tree. The generated code opens up the possibility of many more scalability improvements in the coming releases. The pure iteration speed for the node tree will make compile times for the Python part even shorter in coming releases. Scalability will be a continuous focus for some releases. Then the avoiding of API calls is a huge benefit for many platforms that are otherwise at a disadvantage. This is also only started. We will aim at getting more complex programs to do next to none of these, so far only some tests are working after program start without them, which is of course big progress. We will progress there with future releases as well. Catching up on problems that previous migrations have not discovered is also a huge step forward to restoring the performance supremacy, that was not there anymore in extreme cases. The Yaml package configuration work is showing its fruits. More people have been able to contribute changes for Some part of the Python 3.11 work have positively influenced things, e.g. with the frame cleanup. THe focus of the next release cycle shall be to add support for it. Right now, generator frames need a cleanup to be finished, to also become better and working with 3.11 at the same time. Where possible, work to support 3.11 was also conducted as a cleanup action, or reduction of the technical debts. All in all, it is fair to say that this release is a big leap forward in all kinds of ways.Bug Fixes
dependency_injector
package. Fixed in 1.3.1 already.passlib.apache
package. Fixed in 1.3.2 already.networkx
. Fixed in 1.3.3 already.PATH
environment variable anymore, which could lead to externally provided compilers and internal winlibs gcc clashing on Windows, but should be a general problem. Fixed in 1.3.4 already.cefpython3
package. Fixed in 1.3.4 already.webview
package versions. Fixed in 1.3.4 already.__file__
to None
during multi phase imports, which we then didn’t update anymore, however that is necessary. Fixed in 1.3.4 already.match
cases where an alternative had no condition associated. Fixed in 1.3.5 already.__module__
attribute because there is code out there, that identifies standard loaders through looking at this value, but crashes without it. Fixed in 1.3.5 already.importlib_metadata
backport were using themselves to load their __version__
attribute. Added a workaround for it, since in Nuitka it doesn’t work until after loading the module.pygments.styles
module. Fixed in 1.3.6 already.Ellipsis
too soon during tree building. It is not quite like True
and False
. Fixed in 1.3.6 already.numpy
on macOS didn’t work inside an application bundle anymore. Fixed in 1.3.7 already.os.add_dll_directory
fail to work. Fixed in 1.3.8 already.sqlalchemy
. Fixed in 1.3.8 already.importlib.resources.as_file
to work well with it. Fixed in 1.3.8 already.cv2
was not working with the opencv-python-headless
variant. Package name and distribution name is not a 1:1 mapping for all things. Fixed in 1.3.8 already.tls_client
package.requests
module does, it only adding a dependency on the resolved name, but not requests
itself. The import however was still done at runtime on requests
which then didn’t work. This was only visible if only these aliases to other modules were used..dylib
when scanning for data files unlike all other DLL suffixes.mplcairo
..bin
unlike in accelerated mode. However, this didn’t work well for packages which have binaries colliding with the package name. Therefore now the suffix is added in this case too.platform_utils.paths
. It is guessing the wrong path for included data files with Nuitka.sound_lib
, selecting by OS and architecture.importlib.metadata.metadata
for use at runtime we need to use both package name and distribution name to create it, or else it failed to work. Packages like opencv-python-headless
can now with this too.tkinterweb
on Windows. Other platforms will need work to be done later.New Features
python -m nuitka --list-package-data=tkinterweb
python -m nuitka --list-package-dlls=tkinterweb
not-found
might help recognition of issues.__main__
will be executed. The option to use is multiple --main=some_main.py
arguments. If then the binary name is changed, on execution you get a different variant being executed.--onefile-child-grace-time
the hard way. This avoids hangs of processes that fail to properly shutdown.sys.path
manipulations in the Yaml configuration with new global-sys-path
import hack.--report-template
where the user can provide a Jinja2 template to make his own reports.no_asserts
, no_docstrings
and no_annotations
now. These can be used to limit rules to be only applied when these optional modes are active.Optimization
sparse
and through that Numba in the scipy
package, reducing its distribution footprint. Part of 1.3.3 already.trimesh
package. Part of 1.3.3 already.shap
package. Part of 1.3.8 already.xgboost
docstring dependencies, such that --python-flag=no_docstrings
can be used with this package.frozenset
and empty tuple
need no copieshasattr/getattr/setattr
on dynamic attribute names were done. This was making the tree traversal during optimization slower than necessary.tuple
actually, while most are nodes only. However, dealing with this generically was also slower than necessary.None
values for non-optional children, and it provides generated code to do this in the fastest way possible, although surely some more improvements will come here.auto_compute_handling = "final,no_raise"
will tell the code generator that this expression never raises in the computation, and is final, i.e. doesn’t have any code to evaluate, because it cannot be optimized any further.checkers
previously worked, for every node creation, for every child update, a dictionary lookup had to be done. This is now hard coded for the few nodes that actually want to convert values on the fly and we might make a difference in the future for optional checkers, such that these are only run in debug mode.dict.update
, etc. now provide type shapes. From these type shapes, mixins for the result value type are picked automatically. Previously these shapes were added manually. In some cases, they were even missing. In a few cases, where the type is dependent on the Python version, we do not currently do this though, so this needs more work, but expanding the coverage got easier in this way.os.path.isdir
which was making it relatively slow and wasting 5% compile time on the IO being done. The check got enhanced and most often replaced with using the knowledge from the original import scan eliminating this time..c
files, but compiled generators and compiled cells codes were not yet done like this, making life unnecessarily harder for the compiler and linker. This should also allow more optimization for some codes.PyObject_RichCompareBool
API, as we have our own comparison functions that are faster and faster to call without crossing of DLL barrier.PyIndex_Check
which has become an API in 3.8, and was as a result not inlined anymore with a DLL barrier was to be crossed, making all kinds of multiplication and subscript/index operations slower.PyNumber_Index
API with our own code. As of 3.10 it enforces a conversion to long
that for Nuitka is not a good thing to do in all places. But also due to DLL barrier it was potentially slow to call, and is used a lot, and we can drop the checks that are useless for Nuitka.PyImport_GetModule
for looking up imported modules from sys.modules
, rather look it up from interpreter internals, also this was using subscript functions, when this is always a dictionary.PyImport_GetModuleDict
and instead have our own API to get this quicker.TODO
about inlining the API function used, so we can be faster in a relatively common operation. For every exception handler, we had to do one API call there.PyType_IsSubType
replacement these faster to use and avoid the API call.int
value startup initialization.bytes
value startup initialization.slice
object values with Python 3.10 or highera[x:y]
in the general case has to use a[slice(x,y)]
on that version, making this somewhat relevant to performance in some cases.str
built-in with API callsprint
style tests do not have this as API calls where we strive to remove all API calls for given programs.PyErr_NormalizeException
that will avoid the API call. It may still call the PyObject_IsSubclass
API, for which we only have started replacement work, but this is already a step ahead in the right direction.Organizational
wheel
and setuptools
to install by adding a pyproject.toml
that addresses a warning of pip
. Part of 1.3.6 release already.when
conditions that raise, output which it was exactly. Part of 1.3.3 already.--disable-console
for GUI packages. Otherwise using that, they just deprive themselves of ways to get error information.ccache
report was not enforced. Part of 1.3.7 release already.--onefile-tempdir-spec
that has since been made not OS specific, with even the OS specific name being removed.site-packages
or __pycache__
folders. This should make it easier to use --include-data-file=./**.qml:.
when you have a virtualenv living in the same folder.Cleanups
pywebview
plugin that pertain to the DLLs and data files to package configuration.tuple
values rather than list
values from the tree building stage and node optimization creating new nodes. This allows us to drop conversions previously done inside of nodes.Tests
PySide6
or commercial PySide2
.Summary
anti-bloat
or missing dependencies than ever before.