Protect Data Files

Your program might be using Qt and QML, or other kinds of data file loaded. With Nuitka Commercial, you can produce an executable that contains them and then also covers it under the regular constants data protection automatically.

Due to constant data protection, the contents of the files will be inaccessible to anything but the program, and this is very much what people want.

Your program continues to use standard Python mechanisms from open, pkgutil, pkg_resources, importlib.resources or importlib_resources and generally all similar packages, and these load the data as a file or a stream from within the binary without ever hitting the disk.

All of this is happening with your original code base, i.e. you are not making any modification, that is incompatible with Python. This is one of the major goals. Your code will still run directly in Python during development, and during deployment, the files are embedded by merely adding a Nuitka commercial option.

Examples

We are going to look at the various APIs in detail, and show part of what is working. Generally embedding aims at requiring no modification of your program. Read through all the examples, as each one is going to tell you something new.

Qt resources

This means, e.g. rather than this program

PySideQmlPolarchartTest.py
qmlpolarchart/
    Example-Icon.svg
    main.qml
    View1.qml
    View2.qml
    View3.qml

you get a file layout just simply like this (DLLs and PyQt extensions not shown) that doesn’t contain your QML files, not the icon file.

PySideQmlPolarchartTest.exe

The code can do simply this, and refer to the icon file relative to where the code lives which will work with standard Python, standard Nuitka, and for embedding with Nuitka commercial.

icon_filename = os.path.join(
    os.path.dirname(__file__), "qmlpolarchart/Example-Icon.svg"
)
app.setWindowIcon(QIcon(icon_filename))

For Nuitka commercial, the options to include the data file and to then embed it are as follows:

python -m nuitka ... --include-data-dir=qmlpolarchart=qmlpolarchart \
                     --embed-data-files-qt-resource-pattern=qmlpolarchart/**

The general rule is to first include your data files with the standard Nuitka means, in this case, we use a data directory. Generally e.g. --include-package-data can be a better choice too, as you avoid file paths entirely. After that, embedding options work on target paths. And with the /** part of the patterns (could also use qmlpolarchart) we ask it to include all these files as embedded. This being a pattern allows you to include only a subset

In standard Nuitka, you end up with something like this. And note, that onefile mode is merely a self extracting archive, i.e. these files will be on the customer disk for inspection:

PySideQmlPolarchartTest.exe
qmlpolarchart/
    Example-Icon.png
    Example-Icon.svg
    main.qml
    View1.qml
    View2.qml
    View3.qml

Note

For extension modules and DLLs used, this doesn’t affect things. This is about merely about data files. This is where --onefile comes in and hides these from the view and need for deployment, but it cannot replace Nuitka commercial and its inclusion of data files inside the main binary.

Using open, os.listdir, os.path.isfile, etc.

As you can see from below code, the traditional Python file handling just works, and we took large efforts to get standard open to work seamlessly. It is not the easiest way to use data files, but a very common usage in third party packages and the idea of Nuitka commercial is to make work, what worked before, see below for the more modern approaches.

# Standard open works with binary and text mode, encoding, etc.
print(
   "Contents of data file 1 (as binary):",
   open(
      os.path.join(os.path.dirname(__file__), "data_file1.txt"),
      "rb"
   ).read(),
)
print(
   "Contents of data file 1 (as text):",
   open(
      os.path.join(os.path.dirname(__file__), "data_file1.txt"),
      "r"
   ).read(),
)

# Works also for functions that cannot not be statically optimized.
def dynamicFunction(filename):
   print(
      "Contents of dynamic data file (as binary):",
      repr(open(filename, "rb").read()),
   )

   print(
      "Contents of dynamic data file (as text):",
      repr(open(filename, "r").read()),
   )

dynamicFunction(os.path.join(os.path.dirname(__file__), "data_file3.txt"))

print("Using with and open data file 4 gives (as binary):")

# The "with" statement works too.
with open(
   os.path.join(os.path.dirname(__file__), "data_file4.txt"), "rb"
) as binary_stream:
   print("Line 1:", repr(binary_stream.readline()))
   print("Line 2:", repr(binary_stream.readline()))

with open(
   os.path.join(os.path.dirname(__file__), "data_file4.txt"), "r"
) as text_stream:
   print("Line 1:", repr(text_stream.readline()))
   print("Line 2:", repr(text_stream.readline()))

print("Checking with 'os.path.isdir' for contained and not contained directories:")

# The "os.path.isdir" works, "os.path.isfile", and "os.path.exists" too.
print(
   "isdir on templates:",
    os.path.isdir(os.path.join(os.path.dirname(__file__), "templates")))
)
print(
   "isfile on data_file4.txt:",
   os.path.isfile(os.path.join(os.path.dirname(__file__), "data_file4.txt")),
)
print(
   "exists on 'templates' dir:",
   os.path.exists(os.path.join(os.path.dirname(__file__), "templates")),
)
print(
   "exists on 'data_file4.txt':",
   os.path.exists(os.path.join(os.path.dirname(__file__), "data_file4.txt")),
)

# Listing the embedded data files with os.listdir works too.
print(
   "os.listdir of package dir gives:",
   os.listdir(os.path.join(os.path.dirname(__file__))),
)

Use of pkgutil

The pkgutil.get_data was the first way for standard library of accessing data file content from packages in a fashion that does not require to use __file__. However, it’s limited in getting the data.

import pkgutil
print(
   "Contents of data file 1:",
   repr(pkgutil.get_data(__package__, "data_file1.txt")
)

Use of pkg_resources

The pkg_resources was one of the first to offer ways of accessing data files from packages in a fashion that does not require to use __file__. It has since become less popular and was never part of standard Python, but extremely wide spread and is still in use.

import pkg_resources

# This shows using pkg_resources to get a string.
print(
   "Contents of data file 1:",
   repr(pkg_resources.resource_string(__package__, "data_file1.txt")),
)

# This shows using pkg_resources to get a stream, which then can be used
# like a file. Here we read it right away, but it could be passed elsewhere
# of course.

print(
   "Stream of data file 1 gives:",
   repr(pkg_resources.resource_stream(__package__, "data_file1.txt").read()),
)

Use of importlib.resources and importlib_resources (backport)

The most recent addition to standard library is also available as a backport and both are supported by Nuitka commercial file embedding.

# Could use importlib_resources as well for older Python versions.
import importlib.resources

# Can read as str or bytes:
print(
   "Contents of data file 1 (as binary):",
   repr(importlib.resources.read_binary(__package__, "data_file1.txt")),
)
print(
   "Contents of data file 1 (as text):",
   repr(importlib.resources.read_text(__package__, "data_file1.txt")),
)

# Can get binary or text streams:
print(
   "Stream of data file 1 gives (as binary):",
   repr(importlib.resources.open_binary(__package__, "data_file1.txt").read()),
)

print(
   "Stream of data file 1 gives (as text):",
   repr(importlib.resources.open_text(__package__, "data_file1.txt").read()),
)

Using Jinja2

Jinja does a few special things with files, and Nuitka commercial however supports it as well, without changes. We iterate here over various approaches seen, that all work. File system loaders and package loaders are all fine.

# Old school string paths
loader = jinja2.FileSystemLoader(os.path.dirname(__file__))
env = jinja2.Environment(
   loader=loader,
)
template = env.get_template("templates/data_file5.txt")
print("Template loaded via old school file system strings:", template)

# New school "pathlib" paths work too.
from pathlib import Path
loader = jinja2.FileSystemLoader(
    Path(os.path.dirname(os.path.abspath(__file__))) / "templates"
)
env = jinja2.Environment(
    loader=loader,
)
template = env.get_template("templates/data_file5.txt")
print("Template loaded via pathlib use paths gave template:", template)

# Package loaders work of course too.
loader = jinja2.PackageLoader("data_inclusion", "templates")
env = jinja2.Environment(
   loader=loader,
)
template = env.get_template("templates/data_file5.txt")
print("Template loaded via package loader:", template)

Go back to Nuitka commercial overview to learn about more features or to subscribe to Nuitka commercial.