Skip to content

General Queue (GenQ) Infrastructure

Yanfei Guo edited this page May 20, 2020 · 1 revision

Introduction

The goal of the General Queue infrastructure is to provide a set of abstractions and interfaces for various use-cases of queues in the MPICH code. Examples are:

  • Shared memory communication (recv queues)
  • Shared memory collectives (collective queues)
  • GPU pipelining

Concepts

Cells and Queues

A cell is an unstructured memory region for user data. The General Queue infrastructure needs to know the size of the cell for managing the life cycle of each cell. The depends on the location of the cell, it can be private to a process shared by multiple processes. It can be on the host memory or the device memory.

A queue is a pointer to a series of cells. The General Queue infrastructure provides a queue data structure and a set of operations (enqueue, dequeue, etc). Similar to the cells, a queue can be shared if the queue object is created on a shared memory region. The queues can only be created on host memory.

The general state graph of cells in operation is as follow.

--------------    alloc     ---------------  update cell & enqueue   ---------------
| Free Cell  | -----------> | Cell in use | -----------------------> | Queued Cell |
--------------              ---------------                          ---------------
        ^                                                                    |
        |           free                   ---------------         dequeue   |
        ---------------------------------- | Cell in use | <------------------
                                           ---------------

Object Visibility and Concurrency Support

The object visibility of cells and queues refers to whether these objects can be accessed by multiple processes. Generally, if the memory that backs the object is a shared memory region, it is visible across the processes.

The concurrency support for the object is about whether multiple processes or threads can update one object. This is orthogonal to the object visibility in a way that a shared object may not require concurrent access. For example, the broadcast operation in the shared memory collective uses a broadcast queue that only the root process can enqueue or dequeue. The other processes are just reading the cell that the queue head is pointing at without dequeuing any cell. In this case, the queue is shared but requires no concurrency support. The cells do need concurrency support in this case.

To summarize, the following is a table of use-cases of different visibility and concurrency combinations that the General Queue infrastructure aims to support.

---------------------------------------------------------------------------
 Cell Visibility | Cell Concurrency | Queue Visibility | Queue Concurrency
---------------------------------------------------------------------------
     private     |      serial      |      private     |      serial
---------------------------------------------------------------------------
     private     |    concurrent    |      private     |    concurrent
---------------------------------------------------------------------------
     shared      |    concurrent    |      shared      |    concurrent
---------------------------------------------------------------------------
     shared      |    concurrent    |      shared      |      serial
---------------------------------------------------------------------------

Interfaces

Private Queues

MPIDU_genq_private_queue_t Class

This is a unified class that manages both cells and queues for private visibility use-cases.

Functions

  • MPIDU_genq_private_queue_create creates queue with given cell_size, num_cells_in_block, max_num_cells, a allocation function and flag. This is a local operation. The creation does not allocate the memory for cells immediately. It is dynamically allocated when at the first request of free cells. The flag indicates whether the queue is going to operate in serial or with concurrency support.
  • MPIDU_genq_private_queue_alloc returns a unused cell. All the free cells are maintained in an internal free cell list. This operation simply tried to get one cell from that list. If no cell is available, it will try to allocate more using the allocation function provided at the queue creation until the maximum number of cells is reached.
  • MPIDU_genq_private_queue_enqueue enqueues a cell at the tail of the queue.
  • MPIDU_genq_private_queue_dequeue dequeues a cell from the head of the queue.
  • MPIDU_genq_private_queue_free frees a cell. This operation returns the cell to the free cells list. The memory space is not released, just the cell is marked as unused.
  • MPIDU_genq_private_queue_head access the cell that the queue head points at without dequeuing it.
  • MPIDU_genq_private_queue_destroy free all allocated cells and queue.

Shared Queues

MPIDU_genq_shared_cell_pool_t Class

This is a class that manages all the cells in the shared memory. Unlike the private queue that does dynamic allocation with user-provided allocation function, the shared cell pool does a static allocation with MPIDU_Shm_alloc for the size of cell_size * max_num_cells. Therefore, the creation of the shared cell pool is collective. It simply allocates all the cells in the shared memory. Note that the cell pool objects in processes are private, just the SHM slab for the cells and the internal cell headers (metadata for cell) are shared. Each pool object internally maintains a free list of cells that "belongs" to the current process. The assignment of these free cells to processes happens in the granularity of blocks.

Functions

  • MPIDU_genq_shared_cell_pool_create creates cell pool with given cell_size, num_cells_in_block, max_num_cells and flag. This is a collective operation. This operation allocates two share memory regions, one for the actual cells, another for the cell headers for all the cells. The free cell list of each pool object is initialized as empty. The cells will be assigned to each process's pool object at the first request of free cells. The flag indicates whether the queue is going to operate in serial or with concurrency support.
  • MPIDU_genq_shared_cell_pool_alloc returns a unused cell. All the free cells are maintained in an internal free cell list. This operation simply tried to get one cell from that list. If no cell is available, it will try to get another block from the preallocated SHM slab until the maximum number of cells is reached.
  • MPIDU_genq_shared_cell_pool_free frees a cell. This operation returns the cell to the free cells list. The memory space is not released, just the cell is marked as unused.
  • MPIDU_genq_shared_cell_pool_destroy free all allocated cells and queue.

MPIDU_genq_shared_queue_t Class

This is a class that maintains the basic info of a queue including the pointers to the head and tail, concurrency type of the queue (serial, SPSC, SPMC, MPSC, MPMC). Note there is no allocation function for the shared queue object. There is only a queue init function that initializes the queue data structure at the given address. For shared memory queues, the user is expected to allocate the memory for all queue objects. For example, a collection of eight processes need to create recv queues for point-to-point communication. They need to collectively allocate the shared memory for eight shared queue objects (in an array). Each process will call queue init on the queue object where the index is the local rank of the process.

Functions

  • MPIDU_genq_shared_queue_init initialize a shared queue object.
  • MPIDU_genq_shared_queue_enqueue enqueues a cell at the tail of the queue.
  • MPIDU_genq_shared_queue_dequeue dequeues a cell from the head of the queue.
  • MPIDU_genq_shared_queue_head access the cell at the queue head without dequeuing.

Implementation Details

About Performance

Memory Alignment

The cell memory and the cell header memory are separate. Each of them has its own memory alignment.

Pointers in queue objects are aligned at the cache line boundary to avoid false sharing.

Free Cell Allocation