Python and PyInstaller — Introduction and Troubleshooting
Remove the hassle of Python interpreters by compiling your code
Compiling Python code involves transforming your Python code from a .py file extension to a traditional executable .exe file. Why would you want to do this? There are two primary reasons why developers would want to compile their code:
- Distributing your workflow to non-technical users
- Obfuscating your code
Your code can be more useful if you can easily introduce it to a wide range of people — this would include non-technical users who may never have even opened a command prompt or terminal. Or perhaps you want to hide your code but distribute your workflow, compiling can obfuscate your code and as far as I know, not many people (if any) can extract the Python code from a compiled program.
Introduction to PyInstaller
In this tutorial we will look at compiling Python code using PyInstaller. PyInstaller is a specially designed compiler for Python code that can compile for Windows, MacOS, and Linux systems. This tutorial will be done using the Windows operating system. There are some small differences when compiling which unfortunately I cannot cover because I do not own a MacOS system. We will look at how to troubleshoot problems that are often encountered when compiling and how to troubleshoot them. I will also go over a few basic steps and considerations if you are considering compiling your Python code.
PyInstaller is a Python library that can be installed using Python’s package manager, pip. You can install by simply typing the code in the code block below:
pip install pyinstaller
During the pip install process the PyInstaller library is added to your environment PATH meaning that you can call it in a terminal by typing in
pyinstaller into the command prompt. PyInstaller is designed to worked in a command prompt or terminal environment. You would simply compile your Python code by typing the code below in your terminal:
Personally, I like to modify it slightly by adding the “onefile” option. This creates one giant executable file instead of creating a lot of files which are mixed with your compiled executable.
pyinstaller [python_file] --onefile
PyInstaller Spec File
For more advanced users there is the spec file. This file is created when you run the PyInstaller command on a Python script. It essentially tells the compiler how to compile your script and where to look for relevant data. You can also make a spec file before compiling so you can compile the parameters beforehand. You can create just a spec file using this command which will generate a spec file in the current working directory of the command prompt:
The spec file may look different according to your operating system but here is what it would look like on a Windows sytem:
The analysis section which starts in line 5 gives you options to configure your PyInstaller settings. This is because even though PyInstaller tries to look for all the files required by your script, there may be binary or file connections that are not explicitly linked and will not be compiled. In this case you must locate the files required by the PyInstaller. In your script you must modify the environment using os.environ as seen here to link the files with your compiled software. This will ensure that your compiled script will know where to look for these additional requires files.
- Line 5: The filename of the Python script being compiled
- Line 6: The directory of the Python script being compiled
- Line 7–8: Additional binary and non-binary files. Binary files are often .dll files. You can try manually linking as mentioned above or link them here. Additional non-binary files can include pictures, database files, or others. If you get errors about missing files you should include them here.
- Line 9: Additional imports that may be required by the compiler. Compiled software will require additional library imports compared to a non-compiled script (I will go over this later below). You should put the additional libraries here. The input is a list with each item in the list containing the name of the library as a string. For example:
[‘numpy’, ‘rasterio’, ‘fiona’]
- Line 10–11: If a library requires specific environmental variables they can be called upon here using a runtime hook. These are small scripts that manipulate the environment before your main script runs, effectively providing additional top-level code to your script. I often use them to manipulate the environment parameters using something like:
os.environ[‘PARAM_NAME’]. There is more info later in this tutorial regarding this section.
If you want to use PyInstaller according to your modified spec file then you you will use the command below:
Do not include the Python file in the command. When you include the Python file in your command it can create a brand new spec file and it will overwrite your custom settings.
That’s it! That’s the ideal conclusion and is most likely the case if your program is very simple. Now let’s move onto what happens when you start importing complicated Python libraries…
Planning and Testing for PyInstaller
If you are even thinking about compiling Python code you have to plan it and test the Python libraries that are going to be compiled. Python and its libraries are not designed to be compiled so there will often be problems when you try to compile libraries. There may be cases that a compiled program may need additional imports compared to your original Python code. There may be cases that libraries may just completely fail to compile for some unknown reason. Sometimes libraries like pandas will need unique solutions.
This tutorial will only focus on the PyInstaller testing and general troubleshooting code. The main code I used in my sample script is irrelevant.
Create a simple Python script with only import statements and a print statement at the end. We will have this general workflow:
Troubleshooting the ModuleNotFoundError
In this section I will focus on the
ModuleNotFoundError error — a common issue I encounter when compiling. In this case, the compiled program needs additional imports for it to work. This is a rather easy problem to fix but can be tedious because you have to re-compile your program every time you troubleshoot.
For this small example I will not be focusing on the compiled code itself, just the PyInstaller troubleshooting process. I will be working with a few geospatial data libraries as examples which I commonly use with satellite imagery. This particular example will show that you can troubleshoot and fix the compiler issues by doing additional imports. I will use Rasterio and Fiona libraries. This tutorial will be done a fresh new virtual environment using the Python 3.6 interpreter.
So my main script uses rasterio, fiona, os, and sys libraries. Here is my Python script that just tests import statements. It is called
Let me emphasize that this test script is only written like this for the sake of this tutorial so I walk through the solutions more easily. Yes, there are tracebacks which I can use to troubleshoot and I’m pretty sure this goes against some Python standard about import statements.
Moving on, there are some common libraries I am importing such as
sys but the stars of the show are
rasterio. The logic behind this code is that if PyInstaller manages to compile it will print that it is successful. But if it fails the print statements will help point at which library is failing to compile. So let’s compile it by typing the code below:
The output of this command is a .spec file and a “dist” and “build” folder in the same directory of your Python project folder. The files we want are in the “dist” folder. I open the .exe file through the command prompt so I can catch the print statement or any error tracebacks. If you just double click the .exe like any other file, you cannot see the messages because the program will either crash automatically if there is an error in the compiler or close when it is done.
Uh-oh! Looks like we have an error. “ModuleNotFoundError”. This basically means we have to do some additional imports. Although there are the usual Python tracebacks we can see that it did not even reach our first print statement. We can say for sure this is related to our rasterio import.
We see that it is looking for a module called
rasterio._shim. Let us add this as an additional import.
In line 2 of the testing file I have added this missing module. So let’s compile again then run it again to see if we fixed it. If you have the command prompt open from your previous PyInstaller compilation you can simply run the code below again:
Another error! But it looks like it’s different, now it cannot find
rasterio.control. So let’s add this import statement.
I think you are getting the general idea with
ModuleNotFoundErrors. I will skip ahead because Rasterio requires several of these hidden imports. But here is the same error when you try to import fiona.
After you deal with several of these ModuleNotFoundErrors then your testing script should look like the code below:
When you run the executable file all the print statements should run and it should look like this:
You can see that all the print statements have run correctly. PyInstaller has a function in the spec file that allows you to specify the imports there, it is called “hidden imports”. However, I rather explicitly import them in my code so that if I share the code people will not get confused if the spec file is lost.
As you can see, there are some errors related to the environment configuration of the Rasterio library. Each library may have its own nuances that are beyond the scope of this tutorial. For a quick example, the pyproj library requires that the
proj.db file and the rasterio error seen in the picture above is related to the
gcs.csv file located in your GDAL folder.
Let’s go over how runtime hooks can fix this environment error.
Setting Environment Parameters with Runtime Hooks
As seen in the section above. The import errors were fixed but there is now an error related to missing files. The library is looking for the files but PyInstaller did not configure this, it has to be set manually. This has to be done in a separate file. If you change the environment within your main Python script PyInstaller will ignore it.
In this tutorial, I copied the entire folder used for the
GDAL_DATA environment parameter and pasted it in my project folder. This is not necessary, I was just testing if the script can import correctly with it. Otherwise, you would just set the GDAL_DATA environment parameter to the existing folder. You can check this by typing this into Python:
I also included the
proj.db file and set the
PROJ_LIB parameter to the folder where the script will launch. This environment declaration has to be done in a separate Python script which will be used as a runtime hook as seen in the code below:
hook.py file is included as seen in line 12. It has to be included in a list.
Now that there is a runtime hook specifying where the environment files will be the environment error is fixed as seen below:
Does the gdal folder and additional files need to exist? No. After compilation I deleted the db file and the gdal folder which were specified in the runtime hooks but the program still managed to do imports when launching the compiled exe.
Hopefully this clears up things on PyInstaller and what you should look out for when compiling Python code. I have gotten suggestions on other compilers but those can be for another time.
Any other tips and tricks regarding PyInstaller are welcome.