20 January 2023
Nuitka Package Configuration Kickoff
The term “kickoff” refers to a series of posts about the Nuitka package configuration. The details are here on a dedicate page on the web site only. Nuitka Package Configuration.
This documentation is still very rough and bare of examples, but the goal is to make it more complete as package of this series of posts. When we have an instructive example, we will make a post.
This is an area of Nuitka, where help will be very easy and a wide variety of people will have the skills and desire to help, but the lack of documentation, makes it hard or impossible to channel the common knowledge.
Problem Package
Each post will feature one package that caused a particular problem. In
this case, we are talking about the package tkinterweb
.
Problems are typically encountered in standalone mode only. Missing data files and DLLs are usually only and issue there, but this one actually also had a problem with accelerated mode.
Initial Symptom
The compiled program gave the following error:
ModuleNotFoundError: The files required to run TkinterWeb could not be found. This usually occurs when bundling TkinterWeb into an app without forcing the application maker to include all nessessary files. See https://github.com/Andereoo/TkinterWeb/blob/main/tkinterweb/docs/FAQ.md for more information.
Error: Traceback (most recent call last):
File "C:\...\200~1.0\tkinterweb\__init__.py", line 32, in <module tkinterweb>
ModuleNotFoundError: No module named 'bindings'
Note
The traceback has been redacted, removing user specific part and
replacing it with ...
.
The strange looking part of the filename 200~1.0
in the traceback is
because Nuitka on Windows is convincing the program that it runs in a
short path. So the errors you get, are not pointing to the name of the
binary inside the dist folder, but to a shorter path, that is however
exactly the same folder. You wouldn’t believe how many parts of Python
still don’t have long filenames handled properly. Actually Tcl and
TkInter are among them.
Step 1 - data files
So, whenever a Python module fails to import, this can be caused by missing data files. In Nuitka land, these are not extension modules, DLLs, bytecode, etc. but everything else, and it has been seen that these can cause issues. So what to do here. This is easy, Nuitka has recently gained a command dedicated to this operation, making use of its internal code to list data files for a package.
Here we go
python3.10 -m nuitka --list-package-data=tkinterweb
Gives output:
Nuitka-Tools:INFO: Checking package directory 'C:\Python310\lib\site-packages\tkinterweb' ..
C:\Python310\lib\site-packages\tkinterweb
C:\Python310\lib\site-packages\tkinterweb\tkhtml\combobox-2.3.tm
C:\Python310\lib\site-packages\tkinterweb\tkhtml\pkgIndex.tcl
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Darwin\64-bit\pkgIndex.tcl
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Linux\32-bit\pkgIndex.tcl
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Linux\64-bit\pkgIndex.tcl
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Windows\32-bit\pkgIndex.tcl
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Windows\64-bit\pkgIndex.tcl
So there are data files, great. Lets just include them, and retry. There
appears to be need for that anyway. Now we could use
--include-package-data=tkinterweb
for quickly trying it out, but
since we are in here to make it generally useful, we will start
modification of the Yaml file right away.
- module-name: 'tkinterweb'
data-files:
dirs:
- 'tkhtml'
So, we retry, and nothing changes. Not enough, but on the other hand, it was obviously necessary anyway.
Step 2 - DLL files
The other big trouble maker, and easy to check nowadays, is missing DLLs. Extension modules can depend on DLLs. When they are linked against, this not usually and issue, and Nuitka and its plugins typically resolve all of those perfectly, but often DLLs are loading manually. Maybe this is what is going on here.
Also recently, Nuitka gained a command to check for DLLs.
Here we go
python3.10 bin/nuitka --list-package-dlls=tkinterweb
Gives output:
Nuitka-Tools:INFO: Checking package directory 'C:\Python310\lib\site-packages\tkinterweb' ..
C:\Python310\lib\site-packages\tkinterweb
C:\Python310\lib\site-packages\tkinterweb\tkhtml
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Darwin
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Darwin\64-bit
tkhtml\Darwin\64-bit\Tkhtml30.dylib
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Linux
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Linux\32-bit
tkhtml\Linux\32-bit\Tkhtml30.so
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Linux\64-bit
tkhtml\Linux\64-bit\Tkhtml30.so
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Windows
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Windows\32-bit
tkhtml\Windows\32-bit\Tkhtml30.dll
C:\Python310\lib\site-packages\tkinterweb\tkhtml\Windows\64-bit
tkhtml\Windows\64-bit\Tkhtml30.dll
Note
The output could be not containing folders that have no DLLs themselves, but well, such is life, we are going to improve that another time.
What this tells us, that in fact there are DLLs, and from the looks of it, there is no automatic anything in this. This also appears to be one of those PyPI packages that contain binaries for everything. Rather than building a wheel per architecture this contains some things, on all platforms. For a Python installation that is cool, but surely we do not want to deploy both the 32 and 64 bit DLLs where the compiled binary is only one of these. Do not even think of different OS, like including Linux DLLs on Windows.
So, luckily this is easy to handle. We can select for OS and architecture on Windows for a while already.
- module-name: 'tkinterweb'
dlls:
- from_filenames:
relative_path: 'tkhtml/Windows/32-bit'
prefixes:
- 'Tkhtml'
when: 'win32 and arch_x86'
- from_filenames:
relative_path: 'tkhtml/Windows/64-bit'
prefixes:
- 'Tkhtml'
when: 'win32 and arch_amd64'
Note
Showing this here without the data files section for clarity, obviously the DLLs just get added, and Nuitka prevents you from having two blocks referencing the same module.
So, including DLLs is fairly easy. If the package directory is not where
the DLL lives, you specify relative_path
which is otherwise
optional. This also influences where it is put in the distribution
folder. Then when specifying the DLL, we do only give the prefix of the
DLL. Choosing here to leave out the 30
part of Tkhtml30.dll
just
because it’s probably going to make our life easier down the road,
should they update that version number, it would still automatically
work.
Obviously for other platforms than Windows, the DLLs are not included
now, but lets see if this works. And actually at the time of writing,
this is a first. As you can see, on macOS (recognized from “Darwin”)
only the x86_64
will work, and maybe we should check that out. For
Linux and 32 bit, this shows what an old package this is.
So far, outside of Windows, we do not provide tags for arches.
Note
This is probably going to change now. At least on macOS this seems
very much needed. Maybe also time to cleanup amd64
vs x86_64
which kind of is an inconsistency the technical community has.
Anyway, so more branches will be needed. There is no else
in Nuitka
package configuration. All from_filenames
blocks are applied if the
when
matches.
And actually for data files are similar thing should be done, however,
for the time being --noinclude-data-file
can be your friend there.
You can manually exclude them.
But low and behold, the DLLs are included. The data files are. Typically that is enough, but it still does not work.
Step 3 - Check the compilation report
So after following the easy steps to take, and still not working. We can
check the compilation report. You should always compile with
--report=compilation-report.xml
which produces a very human readable
compilation report, where you can check things easily.
It lists included DLLs and data files, and often also why it is included, and as of recently it learned to output also modules that were used by a module, and modules that were attempted to be used, but not found.
Note
This newly tracked information about failed attempts to use a module are the basis of largely enhanced bytecode caching (demoted e.g. because too large or standard library) in latest Nuitka.
Nuitka will tell use here about the issue from its perspective. So a module is not found at runtime, but what happened at compile time. Only the report can tell.
Lets quote the compilation report snippets.
<module name="__main__" kind="PythonMainModule" reason="Root module">
<optimization-time pass="1" time="0.25" />
<optimization-time pass="2" time="0.01" />
<module_usages>
<module_usage name="tkinterweb" finding="absolute" line="1" />
<module_usage name="tkinter" finding="absolute" line="4" />
<module_usage name="Tkinter" finding="not-found" line="6" />
</module_usages>
</module>
This is the main module. Even without giving you the source code, you can see that the example code does import tkinterweb and tkinter. And due to this being probably very old code, the Python2/Python3 module name difference is present, so it imports the Python3 name successfully, but not the Python2 name.
How do we know this is a bug or not? The reality is, we do by context knowledge, there is not a single best way to decide if an import that is not found represents an issue in the compilation or not. But this looks good. I am showing it to you for educational purpose mostly.
Now lets find the module that raised the ModuleNotFoundError
exception.
<module name="tkinterweb" kind="CompiledPythonPackage" reason="Instructed by user to follow to all non-standard library modules.">
<plugin-influence name="dll-files" influence="condition-used" condition="win32 and arch_x86" tags_used="win32,arch_x86" result="false" />
<plugin-influence name="dll-files" influence="condition-used" condition="win32 and arch_amd64" tags_used="win32,arch_amd64" result="true" />
<optimization-time pass="1" time="0.07" />
<optimization-time pass="2" time="0.03" />
<module_usages>
<module_usage name="os" finding="absolute" line="1" />
<module_usage name="sys" finding="built-in" line="1" />
<module_usage name="sys" finding="built-in" line="27" />
<module_usage name="os" finding="absolute" line="27" />
<module_usage name="ntpath" finding="absolute" line="28" />
<module_usage name="bindings" finding="not-found" line="31" />
<module_usage name="htmlwidgets" finding="not-found" line="32" />
<module_usage name="utilities" finding="not-found" line="33" />
<module_usage name="traceback" finding="absolute" line="35" />
<module_usage name="sys" finding="built-in" line="41" />
<module_usage name="tkinter" finding="absolute" line="44" />
<module_usage name="tkinter" finding="absolute" line="45" />
<module_usage name="tkinter.messagebox" finding="relative" line="45" />
<module_usage name="Tkinter" finding="not-found" line="47" />
<module_usage name="tkMessageBox" finding="not-found" line="48" />
<module_usage name="tkinter" finding="absolute" line="67" />
<module_usage name="Tkinter" finding="not-found" line="69" />
</module_usages>
</module>
At the top, you can see the plugin-influence
. This is where the
plugin records that it influenced. It records what conditions were
checked, and the result. Actually further down, we got this.
<included_dll name="Tkhtml30.dll" dest_path="tkinterweb\tkhtml\Windows\64-bit\Tkhtml30.dll" source_path="C:\Python310_64\lib\site-packages\tkinterweb\tkhtml/Windows/64-bit\Tkhtml30.dll" package="tkinterweb" ignored="no" reason="Yaml config of 'tkinterweb'" />
But that is now why we are here. You can also see the imports being done. They are given with line numbers and the one we care about is this snippet.
<module_usage name="bindings" finding="not-found" line="31" />
So Nuitka didn’t find it at compile time. And a quick check with Python on the prompt would reveal that this name is not importable. So now we switch to the source code of the trouble making module. There is no tool for that yet, typically just do this manually:
>>> import tkinterweb
>>> tkinterweb
<module 'tkinterweb' from 'C:\\Python310\\lib\\site-packages\\tkinterweb\\__init__.py'>
This is a clickable link in my Visual Code terminal, and after I click it and go to the line, what we see is:
import sys, os
sys.path.append(os.path.dirname(os.path.realpath(__file__)))
try:
from bindings import TkinterWeb
from htmlwidgets import HtmlFrame, HtmlLabel
from utilities import Notebook
except (ImportError, ModuleNotFoundError):
# Useless code goes here.
...
What strikes immediately is that Visual Code agrees, and displays the
imported names a color used for modules that it couldn’t resolve. And
actually the first like on top there is revealing rather rare code. This
package is extending the global import path with its package contents.
In this way, what would be tkinterweb.bindings
is available as
bindings
after the module has been imported at runtime.
Expanding the PYTHONPATH
is therefore our next step. Since I am
using bash, I can prefix the call to Nuitka with
PYTHONPATH='C:\\Python310_64\\lib\\site-packages\\tkinterweb'
and
low and behold, it works with this. Compilation takes longer and
includes more modules, and the initial message is gone.
So, how to resolve this. Nuitka has gained a feature dedicated to this.
It will be nice if this was automatically resolved at compile time,
which is well could, note has been taken that there is value in tracking
expanding sys.path
at compile time.
There is another section called import-hacks
and it too recently
gained a new feature dedicated to this.
- module-name: 'tkinterweb'
import-hacks:
- global-sys-path:
# This package forces itself into "sys.path" and expects absolute
# imports to be available.
- ''
We can here provide a list of relative paths, that are added when a
package is imported to the search path of Nuitka. With this we can drop
the PYTHONPATH
which while being a nice workaround, required using
absolute paths of the install, never easy to handle.
With this it now works fully automatically. One issue remains. The
compiled program does not need the sys.path
trick at runtime. And
for isolation purposes, sys.path
ought to be empty, so what we do we
do with this here?
Step 4 - Cleanup the code
In order to get rid of that code, we can use the anti-bloat
mechanism. It is very powerful and can do all sorts of things, but today
we got a simple task for it.
This is the troubling line.
sys.path.append(os.path.dirname(os.path.realpath(__file__)))
There are many ways to change this, it’s always good be at less invasive
as possible, so we do not want to append. We could prefix that line with
if False:
, but that typically only works well for single liners.
What we can do rather generally is something like this:
sys.path.append(os.path.dirname(os.path.realpath(__file__)))
# -> we want this instead:
(os.path.dirname(os.path.realpath(__file__)))
Notice that just not calling will be good enough and extremely likely robust against all kinds of formatting changes, multiple lines, etc. and probably also very applicable should be encounter similar ones.
So this can be expressed with the following yaml snippet.
- module-name: 'tkinterweb'
anti-bloat:
- description: 'remove "sys.path" hack'
replacements_plain:
'sys.path.append': ''
And to know what effect it had and to see the wonders if anti-bloat in
general, you can use --show-source-changes
and output the diffs done
on module source changes.
--- original
+++ modified
@@ -25,7 +25,7 @@
"""
import sys, os
-sys.path.append(os.path.dirname(os.path.realpath(__file__)))
+(os.path.dirname(os.path.realpath(__file__)))
try:
from bindings import TkinterWeb
So, now this is perfect. Just need to add more OS specific branches, maybe also for the data files include more selectively, then this is perfect.
Final remarks
I am hoping you will find this very helpful information and will join
the effort to make packaging for Python work out of the box. Adding
support for tkinterweb
was a little more complex than your typical
package. The OS specific DLLs in different places are relatively
unusual, although it has been seen before and will be gain.
This is a simpler example, that is way less complex, with all defaults just working.
- module-name: 'lightgbm.libpath'
dlls:
- from_filenames:
prefixes:
- 'lib_lightgbm'
Please review the guidelines for contributing, and esp. make sure to
install the commit hook as described, or run
bin/autoformat-nuitka-source --yaml
at least, so the CI will not
complain about formatting and we will have consistent files.
The last hot fixes of 1.3 already have user provided packaging enhancements that add dependencies and anti-bloat. We might discuss those in the next installment.