-
Notifications
You must be signed in to change notification settings - Fork 1
High level overview of our ML pipeline
Current ML pipeline is made for image classification with ResNet 18 as the model.
In the pipeline, only forward and backward pass is done on the GPUs, rest of the steps of the pipeline like fetching from storage and preprocessing are done by the CPU.
Below is description of the dataloader and its communication.
In this section we describe the details of how the dataloader functions as well as its communication with the main process.
The experiments use PyTorch's torch.utils.data.DataLoader
as a data loader worker.
The user has the option to pass the number of workers needed to the API.
The main process forks workers equal to the number of workers (num_workers
) from the main thread.
The main thread is responsible for coordinating preprocessing work to the workers.
The main process forks the number of workers needed and each worker is responsible for preprocessing a batch of data.
In a single node multi-gpu setting, the communication between the main process and the workers is done via python's multiprocessing.Queue
which is internally implemented using shared memory.
There are two communication channels between the main process and the workers namely index queue and data queue.
The data queue is common/shared for all the workers and the main process.
The index queue is unique for each worker and the main process.
The data queue is used to send the preprocessed data from the workers to the main process i.e.
the worker is the producer and the main process is the consumer.
The index queue is used to send the indices of the data to be processed from the main process to the worker processes i.e.
the main process is the producer and the worker is the consumer.
Each of the above queues have one way communication.
The main thread is responsible for copying the data (preprocessed by the workers) to the GPU, and scheduling the GPU kernels to be applied on the data.