From 276e0ef24747256661fc846dddbaaca51578d8fa Mon Sep 17 00:00:00 2001 From: HJA Bird <9040797+hjabird@users.noreply.github.com> Date: Tue, 19 Dec 2023 13:45:37 +0000 Subject: [PATCH] [oneMKL][DFT] Specification for external workspaces for DFTs (#509) * [MKL] Specification for external workspaces * Add support for external workspaces * Add the following config_params: * WORKSPACE_PLACEMENT * WORKSPACE_EXTERNAL_BYTES_REQUIRED * Add descriptor::set_workspace * Describe typical usage of an external workspace * Romain suggested changes * Reword parallel workspace usage restrictions * Note that it is possible to get worse performance * Rename WORKSPACE_EXTERNAL_BYTES_REQUIRED -> WORKSPACE_EXTERNAL_BYTES; Update enum doc * Raphael detailed wording suggestions * Raphael detailed wording suggestions part 2 * Raphael formatting suggestions * Do not allow sub-buffers as workspaces * Remove ..container * Remove GPU reference * WORKSPACE_EXTERNAL_BYTES is for committed descriptors only --- .../dft/config_params/workspace_placement.rst | 87 +++++++++++++++++++ .../oneMKL/source/domains/dft/descriptor.rst | 83 ++++++++++++++++++ .../domains/dft/enums_and_config_params.rst | 20 ++++- 3 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst diff --git a/source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst b/source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst new file mode 100644 index 0000000000..4f73e91da6 --- /dev/null +++ b/source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst @@ -0,0 +1,87 @@ +.. SPDX-FileCopyrightText: Codeplay Software +.. +.. SPDX-License-Identifier: CC-BY-4.0 + +.. _onemkl_dft_config_workspace_placement: + +Workspace placement +-------------------------------------- + +DFT implementations often require temporary storage for intermediate data whilst computing DFTs. +This temporary storage is referred to as a *workspace*. +Whilst this is managed automatically by default (``config_value::WORKSPACE_AUTOMATIC`` +set for ``config_param::WORKSPACE_PLACEMENT``), it may be preferable to provide an external +workspace (``config_value::WORKSPACE_EXTERNAL`` set for ``config_param::WORKSPACE_PLACEMENT``) +for the following reasons: + +* to reduce the number of mallocs / frees; +* to reduce memory consumption. + +For some backends and configurations, ``config_value::WORKSPACE_EXTERNAL`` may reduce performance. + +A typical workflow for using ``config_value::WORKSPACE_EXTERNAL`` is given in the section :ref:`below`. + +WORKSPACE_PLACEMENT ++++++++++++++++++++ + +For ``config_param::WORKSPACE_PLACEMENT``, valid configuration values are ``config_value::WORKSPACE_AUTOMATIC`` and ``config_value::WORKSPACE_EXTERNAL``. + +.. _onemkl_dft_config_value_workspace_automatic: + +.. rubric:: WORKSPACE_AUTOMATIC + +The default value for the ``config_param::WORKSPACE_PLACEMENT`` is ``config_value::WORKSPACE_AUTOMATIC``. + +When set to ``config_value::WORKSPACE_AUTOMATIC`` the user does not need to provide an external workspace. The workspace will be automatically managed by the backend library. + +.. _onemkl_dft_config_value_workspace_external: + +.. rubric:: WORKSPACE_EXTERNAL + +The configuration ``config_param::WORKSPACE_PLACEMENT`` can be set to +``config_value::WORKSPACE_EXTERNAL`` to allow the workspace to be set manually. + +When a descriptor is committed with ``config_value::WORKSPACE_EXTERNAL`` set +for ``config_param::WORKSPACE_PLACEMENT``, the user must provide an external +workspace before calling any compute function. +See :ref:`onemkl_dft_descriptor_set_workspace` and :ref:`onemkl_dft_typical_usage_of_workspace_external`. + +.. _onemkl_dft_typical_usage_of_workspace_external: + +Typical usage of ``WORKSPACE_EXTERNAL`` ++++++++++++++++++++++++++++++++++++++++ + +Usage of ``config_value::WORKSPACE_EXTERNAL`` typically involves the following order of operations: + +#. ``config_value::WORKSPACE_EXTERNAL`` is set for the uncommitted descriptor's ``config_param::WORKSPACE_EXTERNAL``. +#. The descriptor is committed. +#. The required workspace size is queried. +#. A workspace of sufficient size is provided to the descriptor. +#. Compute functions following the type of external workspace provided are called. +#. The user is responsible for freeing the external workspace. + +This is shown in the following example code: + +.. code-block:: cpp + + // Create a descriptor + mkl::dft::descriptor desc(n); + // 1. Set the workspace placement to WORKSPACE_EXTERNAL + desc.set_value(mkl::dft::config_param::WORKSPACE_PLACEMENT, + mkl::dft::config_value::WORKSPACE_EXTERNAL); + // Set further configuration parameters + // ... + // 2. Commit the descriptor + desc.commit(myQueue); + // 3. Query the required workspace size + std::int64_t workspaceBytes{0}; + desc.get_value(mkl::dft::config_param::WORKSPACE_EXTERNAL_BYTES, &workspaceBytes); + // Obtain a sufficiently large USM allocation or buffer. For this example, a USM allocation is used. + float* workspaceUsm = sycl::malloc_device(workspaceBytes / sizeof(float), myQueue); + // 4. Set the workspace + desc.set_workspace(workspaceUsm); + // 5. Now USM compute functions can be called. + + +**Parent topic:** :ref:`onemkl_dft_enums` + diff --git a/source/elements/oneMKL/source/domains/dft/descriptor.rst b/source/elements/oneMKL/source/domains/dft/descriptor.rst index 7ea6a02537..27cb505c8d 100644 --- a/source/elements/oneMKL/source/domains/dft/descriptor.rst +++ b/source/elements/oneMKL/source/domains/dft/descriptor.rst @@ -75,6 +75,9 @@ The ``descriptor`` class is defined in the ``oneapi::mkl::dft`` namespace. void set_value(oneapi::mkl::dft::config_param param, ...); void get_value(oneapi::mkl::dft::config_param param, ...); + + void set_workspace(sycl::buffer &workspaceBuf); + void set_workspace(scalar_type* workspaceUSM); void commit(sycl::queue &queue); @@ -129,6 +132,8 @@ The ``descriptor`` class is defined in the ``oneapi::mkl::dft`` namespace. * - :ref:`onemkl_dft_descriptor_get_value` - Queries the configuration value associated with a particular configuration parameter. + * - :ref:`onemkl_dft_descriptor_set_workspace` + - Sets the external workspace to use when ``config_param::WORKSPACE_PLACEMENT`` is set to ``config_value::WORKSPACE_EXTERNAL``. * - :ref:`onemkl_dft_descriptor_commit` - Commits the ``descriptor`` object to enqueue the operations relevant to the (batched) DFT(s) it determines to a given, user-provided @@ -417,6 +422,84 @@ type ``oneapi::mkl::dft::domain``, ``oneapi::mkl::dft::precision``, **Descriptor class member table:** :ref:`onemkl_dft_descriptor_member_table` +.. _onemkl_dft_descriptor_set_workspace: + +set_workspace ++++++++++++++ + +Sets the workspace for when ``config_param::WORKSPACE_PLACEMENT`` is set to ``config_value::WORKSPACE_EXTERNAL``. + +.. rubric:: Description + +This function sets the workspace to use when computing DFTs for when an +external workspace is set. +This function may only be called after the descriptor has been committed. +The size of the provided workspace must be equal to or larger than the required +workspace size obtained by calling ``descriptor::get_value(config_param::WORKSPACE_EXTERNAL_BYTES, &workspaceBytes)``. + +A descriptor where ``config_value::WORKSPACE_EXTERNAL`` is specified for +``config_param::WORKSPACE_PLACEMENT`` is not a valid descriptor for compute +calls until this function has been successfully called. + +The type of workspace must match the compute calls for which it is used. +That is, if the workspace is provided as a ``sycl::buffer``, the compute +calls must also use ``sycl::buffer`` for their arguments. Likewise, a USM +allocated workspace must only be used with USM compute calls. +Failing to do this will result in an invalid descriptor for compute calls. + +If the workspace is a USM allocation, the user must not use it for other purposes +in parallel whilst the DFT ``compute_forward`` or ``compute_backward`` are in progress. + +This function can be called on committed descriptors where the workspace placement +is not ``config_value::WORKSPACE_EXTERNAL``. The provided workspace may or may not +be used in compute calls. However, the aforementioned restrictions will still apply. + +.. rubric:: Syntax (buffer workspace) + +.. code-block:: cpp + + namespace oneapi::mkl::dft { + + template + void descriptor::set_workspace(sycl::buffer &workspaceBuf); + } + +.. rubric:: Syntax (USM workspace) + +.. code-block:: cpp + + namespace oneapi::mkl::dft { + + template + void descriptor::set_workspace(scalar_type* workspaceUSM); + + } + +.. container:: section + + .. rubric:: Input Parameters + + workspaceBuf + A workspace buffer where ``scalar_type`` is the floating-point type according to ``prec``. This buffer must be sufficiently large or an exception will be thrown. A sub-buffer cannot be used. + + workspaceUSM + A workspace USM allocation where ``scalar_type`` is the floating-point type according to ``prec``. This allocation must be accessible on the device on which the descriptor is committed. It is assumed that this USM allocation is sufficiently large. The pointer is expected to be aligned to ``scalar_type``. + +.. container:: section + + .. rubric:: Throws + + The ``descriptor::set_workspace()`` routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here: + + :ref:`oneapi::mkl::invalid_argument()` + If the provided buffer ``workspaceBuf`` is not sufficiently large or is a sub-buffer, or if the provided USM allocation ``workspaceUSM`` is ``nullptr`` when an external workspace of size greater than zero is required. + + :ref:`oneapi::mkl::uninitialized()` + If ``set_workspace`` is called before the descriptor is committed. + + +**Descriptor class member table:** :ref:`onemkl_dft_descriptor_member_table` + .. _onemkl_dft_descriptor_commit: commit diff --git a/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst b/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst index 8bc2482107..9e69d80eb3 100644 --- a/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst +++ b/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst @@ -142,7 +142,10 @@ the :ref:`descriptor` class. OUTPUT_STRIDES, // deprecated FWD_DISTANCE, - BWD_DISTANCE + BWD_DISTANCE, + + WORKSPACE_PLACEMENT, + WORKSPACE_EXTERNAL_BYTES }; Configuration parameters represented by ``config_param::FORWARD_DOMAIN`` and @@ -262,6 +265,16 @@ the :ref:`descriptor` class. :math:`M > 1`. - | ``std::int64_t`` | [0] + * - :ref:`WORKSPACE_PLACEMENT` + - Some FFT algorithm computation steps require a scratch space for permutations or other purposes. + This parameter controls whether this scratch space is automatically allocated or provided by the user. + - | :ref:`onemkl_dft_enum_config_value` (possible values are ``config_value::WORKSPACE_AUTOMATIC`` or ``config_value::WORKSPACE_EXTERNAL``). + | [``config_value::WORKSPACE_AUTOMATIC``] + * - WORKSPACE_EXTERNAL_BYTES + - The required minimum external workspace size for use by :ref:`set_workspace`. + A read-only value, on committed descriptors only. + - | ``std::int64_t`` + .. _onemkl_dft_enum_config_value: @@ -288,6 +301,10 @@ values associated with some // for config_param::PLACEMENT INPLACE, NOT_INPLACE + + // For config_param::WORKSPACE_PLACEMENT + WORKSPACE_AUTOMATIC, + WORKSPACE_EXTERNAL, }; **Parent topic:** :ref:`onemkl_dft` @@ -297,3 +314,4 @@ values associated with some config_params/data_layouts config_params/storage_formats + config_params/workspace_placement