Calling 'import' in Python Does More Than You Think

Calling 'import' in Python Does More Than You Think

When writing Python code, there are many reserved words such as None, del, if, else, etc. These are words that carry special meaning to the Python interpreter and as such should not be used as variable names. The keyword import is a reserved word used to bring outside Python code into the current scope within a file – or at least that is what it is most commonly used for. Did you know,  however, that you can import anything you want into Python?

Imagine you have a .json file of recipes that you want to use inside a Python file. How would you load it? Maybe your first instinct would be to do the following:

import json

if __name__ == '__main__':
	with open("recipes.json") as f:
    	recipes = json.load(f)

Now you have the recipes as a Python dict to use how you want.  Now imagine that you could write it as the following instead:

import recipes

How can this be? As it turns out, the Python import system is built so that you can modify the behavior of the import statement to choose what you want to do when import is called.

If you want to know some examples of what is possible using the methods described in this post you can check out a project a wrote called Toadstool that implements many of these techniques. There I show how you can load GraphQL queries, JSON files, config files, CSV files, and more directly from an import statement.

Now, let's take a deeper dive into the Python import system.

The Python Import System

The full detail of the import system is described here, but for now I will give a high-level overview. As the docs state:

The import statement combines two operations; it searches for the named module, then it binds the results of that search to a name in the local scope

So, you have two different points in which you can modify the behavior of the import statement.

  1. You can can modify its search function, or how it maps the value after the import statement to an action, and you can modify how it maps the.
  2. You can modify the binding function, or what action to take once the "module" is found .

Finders and Loaders

The first step is know as Searching while the second step is know as Loading. Without modifying the import system, the default finders (modules that perform the searching operation) and loaders are used. As with all things in Python, the searcher and loaders are just objects. You can even inspect them for yourself.

❯ python3
>>> import sys
>>> sys.meta_path
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]

All loaded modules will be cached in a dict called sys.modules which maps module names to all of the variables exported from the module. When importing a module with import X, Python will check if X is already in sys.modules and only resort to using its finders and loaders if X is not already present.

Here is a brief flowchart of what I described above.

Customizing the Import System

💡
There are two import hooks that can be used to extend the import system behavior: meta hooks and import path hooks. We will only be focusing on meta path hooks in this post but you can read more about import hooks here.

The Meta Path

As you saw from the snippet above, finders and loaders are stored in sys.meta_path. When import X is called, if X is not found in sys.modules, then Python will iterate over all of the objects stored in sys.meta_path and check if they implement the find_spec method which can be used to find X. If the finder returns a spec, then Python will invoke the exec_module function to invoke the loading step.  Python’s default sys.meta_path has three meta path finders, one that knows how to import built-in modules, one that knows how to import frozen modules, and one that knows how to import modules from an import path (i.e. the path based finder).

So to modify how import works you can simply modify the contents of sys.meta_path to include finders and loaders that work the way you want. That means that you must create an object that implements the find_spec and exec_module methods and add it to sys.meta_path. Please note that if you wish to supersede default module loading behavior (typically modules that end in .py), then you will either have to delete the default modules from sys.meta_path or prepend your custom modules such that they will be invoked before the default finders.

💡
Within the sys.meta_path you can implement a single object to perform both searching and loading, but it is important to understand that the operations are separate and constitute two different steps of the import process.

Custom Searching

To implement custom search behavior, we must have a class that implements the find_spec method. The full signature of the method is find_spec(fullname, path, target=None). Here, path refers to the path that of the parent module, so if path is None then it is a top-level import, otherwise path will contain the name of the parent module. fullname is the name passed to the import statement, so for import A.B.C, fullname would be "A.B.C". target is a module object that the finder may use to make a more educated guess about what spec to return. find_spec should either return None if the finder could not find the module, or a ModuleSpec if it is found. What it means to "find the module" is up to your implementation.

So let's say we want to implement a finder for .json files. It might look something like

class JsonLoader():
    """
    Used to import Json files into a Python dict
    """
    @classmethod
    def find_spec(cls, name, path, target=None):
        """Look for Json file"""
        package, _, module_name = name.rpartition(".")
        filename = f"{module_name}.json"
        directories = sys.path if path is None else path
        for directory in directories:
            path = pathlib.Path(directory) / filename
            if path.exists():
                return ModuleSpec(name, cls(path))
⚠️
Note that we use the @classmethod property here. This is necessary so that Python does not have to instatiate a new JSONLoader object to invoke find_spec since the class JsonLoader itself is what is stored in sys.meta_path.

Line 8 simply extracts the part of the fullname that we care about (everything after the last . in the name). Line 9 is where we specify that we're only looking for files of the given name that end in .json. Line 10 is where we specify which directories to search over. This is either the sys.path is no path is passed to find_spec or it's the path. Again, this is just the behavior we are choosing to implement here – you can use these parameters that are passed to find_spec however you see fit. Lastly, lines 11-14 are checking for the existance of the .json file, and if found, returning a new ModuleSpec with the name and the path and a new instance of the JsonLoader class initialized with the path (we will use this with the nex step).  This does no action regarding even opening the file, let alone binding new variables. That is what will be done by the loader. Because of how we are constructing this class, JsonLoader will also serve as the loader, which is why we passed cls(path to ModuleSpec on line 14, but you could also pass any other class that implements the exec_module method.

Custom Loading

Now that we have identified the .json file we have to determine what to do with it. For that, we will use the same class as before and just implement a new method: exec_module since our finder deferred to JsonLoader again with its call to ModuleSpec.

class JsonLoader():
    """
    Used to import Json files into a Python dict
    """
    def __init__(self, path):
        """Store path to Json file"""
        self.path = path

    @classmethod
    def find_spec(cls, name, path, target=None):
        """Look for Json file"""
        package, _, module_name = name.rpartition(".")
        filename = f"{module_name}.json"
        directories = sys.path if path is None else path
        for directory in directories:
            path = pathlib.Path(directory) / filename
            if path.exists():
                return ModuleSpec(name, cls(path))

    def create_module(self, spec):
        """Returning None uses the standard machinery for creating modules"""
        return None

    def exec_module(self, module):
        """Executing the module means reading the JSON file"""
        with self.path.open() as f:
            data = json.load(f)
        fieldnames = tuple(_identifier(key) for key in data.keys())
        fields = dict(zip(fieldnames, [to_namespace(value) for value in data.values()]))
        module.__dict__.update(fields)
        module.__dict__["json"] = data
        module.__file__ = str(self.path)

On line 5 we implement the class' __init__ method to save the path that was passed to the find_spec method. The path is not passed to exec_module by default, so this is a way of maintaining that information.  On line 24 we define the exec_module method which is where we perform any bindings. Here we read the json file, identify all keys from the JSON file, convert those names to valid Python variable names if needed, and update the calling module's __dict__ (which stores all variables in scope) to now contain the values from the JSON file.  

Customized Result

Now append JsonLoader to sys.meta_path and assume you have the following employees.json file:

{
    "employee": {
        "name":       "sonoo",
        "salary":      56000,
        "married":    true
    },
    "menu": {
        "id": "file",
        "value": "File",
        "popup": {
          "menuitem": [
            {"value": "New", "onclick": "CreateDoc()"},
            {"value": "Open", "onclick": "OpenDoc()"},
            {"value": "Save", "onclick": "SaveDoc()"}
          ]
        }
      }
}

Now you can load employees.json and use it as follows:

import employees

>>> employees.
employees.employee  employees.json      employee.menu
print(employee.menu)
> {'id': 'file', 'value': 'File', 'popup': {'menuitem': [{'value': 'New', 'onclick': 'CreateDoc()'}, {'value': 'Open', 'onclick': 'OpenDoc()'}, {'value': 'Save', 'onclick': 'SaveDoc()'}]}}

You can even specify only importing certain keys from the file.

from employees import employee

This will only load employee from employees.json and NOT load the menu key.

Conclusion

This was just a sample of what can be done by using Python's import hooks (and remember, it only covered one of the two available hooks).  You can customize this behavior to your heart's desire. If you want to see a more generalized approach you can check out my project I mentioned at the top at https://github.com/acalejos/toadstool/. The package is able to be installed via pip install toadstool if you want to try out some of the loaders.

If you need more dynamic import behavior within you code, you can also look to importlib, which is described as having three main purposes

  1. Provide the implementation of the import statement (and thus, by extension, the import() function) in Python source code.
  2. Expose the components to implement import and thus giving users the ability to create their own importers
  3. Contains modules exposing additional functionality for managing aspects of Python packages

As you can see, the topic of the Python import system goes very deep, so I would encourage you to explore it further and gain a better understanding of it.

Comments