AI runtime configuration scheme #373

nikos-livathinos · 2024-11-19T07:59:42Z

nikos-livathinos
Nov 19, 2024
Collaborator

Objective

We want to introduce input parameters that control the AI runtime configuration which is used to run the Docling AI models.
The AI runtime parameters should include the "device" and "number of threads".
The user should be able to set these parameters either as envvars or as API/CLI parameters.

Introduce the logic to do the resolution of input parameters and decide the AI runtime configuration based on:

The values of envvars (if set).
The values of API/CLI parameters (if set).
The default values of each parameter.
The availability of AI execution devices on the system.
The input parameters required by the models.

Proposal

A. Introduce input parameters in docling package:

API: Extend the PipelineOptions to introduce the fields: device, num_threads.
- The device parameter can be an Enum with values: [AUTO, CUDA, CPU, MPS]. The default value is AUTO.
- The num_threads parameter is an integer. The default value is 4.
Envvars: Standardize names of envvars that set the device and number of threads.
- For example [DOCLING_DEVICE, DOCLING_NUM_THREADS] or [DOCLING_DEVICE, OMP_NUM_THREADS].
CLI: Introduce corresponding CLI parameters device and num_threads.

B. Introduce configuration resolution logic as a utility function in docling-ibm-models package:

Define device precedence: [CUDA, MPS, CPU].
Use the user-provided API/CLI parameters, if they have been explicitly set by the user.
Use the envvars, if they are set.
Use the default values.
5a. If device == AUTO, use the first available device according to the precedence.
5b. If device is explicitly set by the user but it is not available in the system, replace it with the next available device.

C. Usage:

The user either relies on the default values or explicitly sets the input parameters.
- The API parameters are set individually for each pipeline.
- In case of the docling CLI, the parameters apply for all pipelines.
Each model receives the input runtime parameters as part of the pipeline_options.
The utility function from docling-ibm-models is used to resolve the runtime configuration.
- This step can be done either by the "model predictors" inside the docling-ibm-models package or by the "models" inside the docling package.
- The resolved runtime configuration can be further customized to feed the input parameters of each AI model.

dolfim-ibm · 2024-11-19T09:22:39Z

dolfim-ibm
Nov 19, 2024
Maintainer

In terms of resolution priority, we should do (from higher priority to lower):

device set in the pipeline options (or directly in the models init)
CLI argument
ENV variable
Code defaults

0 replies

vagenas · 2024-11-19T09:32:17Z

vagenas
Nov 19, 2024
Maintainer

To streamline this config mgmt part, let's try to leverage Pydantic Settings as much as possible, as it already covers many of the pain points.

1 reply

vagenas Nov 21, 2024
Maintainer

FYI we can more precisely get inspired by my implementation of profiles in deepsearch-toolkit, which is essentially a utility for persisted, swappable configurations using Pydantic Settings and platformdirs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI runtime configuration scheme #373

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

AI runtime configuration scheme #373

nikos-livathinos Nov 19, 2024 Collaborator

Objective

Proposal

Replies: 2 comments · 1 reply

dolfim-ibm Nov 19, 2024 Maintainer

vagenas Nov 19, 2024 Maintainer

vagenas Nov 21, 2024 Maintainer

nikos-livathinos
Nov 19, 2024
Collaborator

Replies: 2 comments 1 reply

dolfim-ibm
Nov 19, 2024
Maintainer

vagenas
Nov 19, 2024
Maintainer

vagenas Nov 21, 2024
Maintainer