Lazy importing and modular architecture #223
JiriPavela
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Preface
For quite some time already, Perun is suffering from the following issues:
perun --version
.These issues have a common root cause: dependency bloat caused by packages needed only by a handful (or even single) commands or modules within Perun. We can only expect this issue getting worse over time, as we integrate more tools with their specific dependencies into our toolsuite. Manually trimming the dependencies or embedding their code directly into Perun can only get us so far, and we instead need a systematic long-term solution. Hence, I propose the following three-step rework of Perun architecture.
Solution Step 1: Lazy Imports and Loading
Lazy importing refers to the practice of postponing the resolution of the imported module or package until it is accessed. Similarly, lazy loading postpones the loading of module or package content but resolves the module or package path eagerly to discover missing packages early.
In our particular use-case, we want to leverage both lazy imports and lazy loading. I considered the following implementation approaches.
lazy-loader
libraryA package supporting both lazy import and lazy loading concepts developed by the Scientific Python Ecosystem Coordination community to enable fast adoption of SPEC 1: Lazy Loading of Submodules and Functions. According to their GitHub repo,
lazy-loader
:Local Subpackages
lazy-loader
supports multiple approaches of local submodules lazy loading. I propose we adopt the following one, as it ensures that type checkers and IDEs/editors autocomplete work while avoiding duplicated import specification at the same time:This means that for local subpackages that are to be lazily imported, we need an additional
__init__.pyi
file. However, this is still fully optional and subpackages that incur no significant startup cost can be imported eagerly. It seems this approach ensures both lazy importing and lazy loading,but we should probably test it first to be sure.
External Packages
External libraries and packages can be both imported and loaded lazily using the
load
function:Note that it is discouraged to lazily import subpackages directly, e.g.,
We still need to check whether the
load
function populates the__getattr__
,__dir__
and__all__
attributes so that type checkers and editors work as expected. In case it doesn't, we can still rely on either (a) importing those external packages eagerly in our lazily imported local subpackages if possible, or (b) use the following (a bit ugly) idiom:Debugging and Development
The lazy importing and loading can be disabled on demand by setting the
EAGER_IMPORT
environment variable. This may help with debugging import issues. This particular feature makes this library stand out, as I haven't found any other library that provides such feature.Other Existing Libraries and Solutions
For completeness, I include a list with other libraries and solutions that I found lacking in some aspect.
lazy-imports
librarylazy-loader
.if TYPE_CHECKING
idiom to ensure type checks and editor autocomplete work.demandimport
librarylazy_import
librarylazy-loader
library.importlib
LazyLoaderimportlib
standard Python library.if TYPE_CHECKING
idiom.lazy_loader.py
moduleSolution Step 2: Extension Architecture
I propose we re-design Perun's architecture by leveraging the lazy importing and loading mechanism (ideally using the
lazy-loader
library).Concretely, we should lazily import local Perun subpackages with either
This means that most subpackages in
collect
,view
,postprocess
, etc. will need to be imported lazily. Other local subpackages that meet neither of those criteria may still be imported eagerly.Module Loading Side Effects
Some of our subpackages and modules contain some code in
__init__.py
. This becomes an issue if the code either (a) directly causes some side effects, e.g., registering a subclass or a method in some callback table, or (b) initializes some constants (including tuples, lists, dictionaries, etc.) that are then accessed outside the module (e.g., the profile type generated by a collector). When such subpackage is lazily imported, this code will not be executed until some name in the subpackage is accessed.We still need to agree on how to handle such constants defined in subpackages (i.e., how to access them without loading the entire subpackage). Also, other code that executes when a subpackage is imported (i.e., code in
__init__.py
) needs to be moved or refactored in some way.Solution Step 3: Modular Installation
The last proposed step is to take advantage of the new modular architecture to speed up Perun installation and reduce dependency bloat. I propose we rework the specification of dependencies and optional dependencies in
pyproject.toml
so that they mimic the extensions (i.e., lazily imported and loaded subpackages).Specifically, the
dependencies
section inpyproject.toml
should contain only the core dependencies, and possibly other lightweight dependencies (both in terms of their size and their dependencies). Theoptional-dependencies
should then contain targets that will include all additional dependencies needed by specific extensions, e.g.,Additionally, the
pyproject.toml
should contain a specialall
target that will download and install all core and optional dependencies. To avoid duplicate specification of dependencies, I propose we define theall
target as follows:We could also possibly define a
all-dev
target that adds the testing, linting, typing and docs packages on top of theall
packages:Handling Missing Packages
When a user attempts to access an extension that is not installed, Perun will fail with an import error exception that won't be of much help to the user. I propose we implement (possibly multiple) custom exception(s) that will provide more informative error message and suggest to the user the extension to install, e.g.,
pip install perun-toolsuite[regression-analysis]
. To make it more maintainable, this suggestion should be created programatically without the need to specify it in the code. A possible approach would be to use libraries such asimportlib.metadata
orpackaging
to obtain the project dependencies and find the correct target based on the subpackage in which the import error occured.Possible Issues With Tests
Perun test suite will need to be updated so that it correctly handles missing optional dependencies without causing the tests to fail. This will require some further research on the possible solution, but I currently see the following possible solutions:
Regardless of the selected solution, we should test both full and core installations of Perun in CI. Also, if we decide to use the
lazy-loader
library, we should test both lazy and eager imports.Beta Was this translation helpful? Give feedback.
All reactions