diff --git a/source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst b/source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst new file mode 100644 index 0000000000..776283af1f --- /dev/null +++ b/source/elements/oneMKL/source/domains/dft/config_params/workspace_placement.rst @@ -0,0 +1,85 @@ +.. SPDX-FileCopyrightText: Codeplay Software +.. +.. SPDX-License-Identifier: CC-BY-4.0 + +.. _onemkl_dft_config_workspace_placement: + +Workspace placement +-------------------------------------- + +DFT implementations often require temporary storage for intermediate data whilst computing DFTs. +This temporary storage is referred to as a *workspace*. +Whilst this is managed automatically by default (``config_param::WORKSPACE_AUTOMATIC``), +it may be preferable to provide an external workspace (``config_param::WORKSPACE_EXTERNAL``) for the following reasons: + +* To reduce the number of GPU mallocs / frees +* To reduce memory consumption + +A typical workflow for using ``config_param::WORKSPACE_EXTERNAL`` is given in the section :ref:`onemkl_dft_typical_usage_of_workspace_external`. + +WORKSPACE_PLACEMENT ++++++++++++++++++++ + +For ``config_param::WORKSPACE_PLACEMENT``, valid configuration values are ``config_value::WORKSPACE_AUTOMATIC`` and ``config_value::WORKSPACE_EXTERNAL``. + +.. container:: section + + .. _onemkl_dft_config_value_workspace_automatic: + + .. rubric:: WORKSPACE_AUTOMATIC + +The default value for the ``config_param::WORKSPACE_PLACEMENT`` is ``config_value::WORKSPACE_AUTOMATIC``. + +When set to ``config_value::WORKSPACE_AUTOMATIC`` the user does not need to provide an external workspace. The workspace will be automatically managed by the backend library. + +.. container:: section + + .. _onemkl_dft_config_value_workspace_external: + + .. rubric:: WORKSPACE_EXTERNAL + +The configuration ``config_param::WORKSPACE_PLACEMENT`` can be set to +``config_value::WORKSPACE_EXTERNAL`` to allow the workspace to be set manually. + +When a descriptor is committed with ``config_value::WORKSPACE_EXTERNAL`` set, +the user must provide an external workspace. +See :ref:`onemkl_dft_descriptor_set_workspace` and :ref:`onemkl_dft_typical_usage_of_workspace_external`. + +.. _onemkl_dft_typical_usage_of_workspace_external: + +Typical usage of ``WORKSPACE_EXTERNAL`` ++++++++++++++++++++++++++++++++++++++++ + +Usage of ``WORKSPACE_EXTERNAL`` typically involves the following order of operations: + +#. ``WORKSPACE_EXTERNAL`` is set for the uncommitted descriptor. +#. The descriptor is committed. +#. The required workspace size is queried. +#. A workspace of sufficient size is provided to the descriptor. +#. Compute functions following the type of external workspace provided are called. + +This is shown in the following example code: + +.. code-block:: cpp + + // Create a descriptor + mkl::dft::descriptor desc(n); + // 1. Set the workspace placement to WORKSPACE_EXTERNAL + desc.set_value(mkl::dft::config_param::WORKSPACE_PLACEMENT, + mkl::dft::config_value::WORKSPACE_EXTERNAL); + // Set further configuration parameters + // ... + // 2. Commit the descriptor + desc.commit(myQueue); + // 3. Query the required workspace size + std::int64_t workspaceBytes{0}; + desc.get_value(mkl::dft::config_param::WORKSPACE_EXTERNAL_BYTES_REQUIRED, &workspaceBytes); + // Obtain a sufficiently large USM allocation or buffer. For this example, a USM allocation is used. + float* workspaceUsm = sycl::malloc_device(workspaceBytes / sizeof(float), myQueue); + // 4. Set the workspace + desc.set_workspace(workspaceUsm); + // 5. Now USM compute functions can be called. + + +**Parent topic:** :ref:`onemkl_dft_enums` + diff --git a/source/elements/oneMKL/source/domains/dft/descriptor.rst b/source/elements/oneMKL/source/domains/dft/descriptor.rst index ea3defe88a..3e6cfb30cf 100644 --- a/source/elements/oneMKL/source/domains/dft/descriptor.rst +++ b/source/elements/oneMKL/source/domains/dft/descriptor.rst @@ -54,6 +54,9 @@ The descriptor class lives in the ``oneapi::mkl::dft`` namespace. void set_value(config_param param, ...); void get_value(config_param param, ...); + + void set_workspace(sycl::buffer &workspaceBuf); + void set_workspace(scalar_type* workspaceUSM); void commit(sycl::queue &queue); @@ -92,6 +95,8 @@ The descriptor class lives in the ``oneapi::mkl::dft`` namespace. - Sets one particular configuration parameter with the specified configuration value. * - :ref:`onemkl_dft_descriptor_get_value` - Gets the configuration value of one particular configuration parameter. + * - :ref:`onemkl_dft_descriptor_set_workspace` + - Set the external workspace to use when ``config_param::WORKSPACE_PLACEMENT`` is ``config_value::WORKSPACE_EXTERNAL``. * - :ref:`onemkl_dft_descriptor_commit` - Performs all initialization for the actual FFT computation. @@ -346,7 +351,84 @@ corresponding configuration values, see :ref:`onemkl_dft_enum_config_param`. **Descriptor class member table:** :ref:`onemkl_dft_descriptor_member_table` +.. _onemkl_dft_descriptor_set_workspace: +set_workspace ++++++++++++++ + +Set the workspace for when ``config_param::WORKSPACE_PLACEMENT`` is set to ``config_value::WORKSPACE_EXTERNAL``. + +.. rubric:: Description + +This function sets the workspace to use when computing DFTs for when an +external workspace is set. +This function may only be called after the descriptor has been committed. +The provided workspace must be equal to or larger than the required +workspace size obtained by calling ``descriptor::get_value(config_param::WORKSPACE_EXTERNAL_BYTES_REQUIRED, &workspaceBytes)``. + +A descriptor where ``WORKSPACE_EXTERNAL`` is specified is not a valid descriptor +for compute calls until this function has been successfully called. + +The type of workspace must match the compute calls for which it is used. +That is, if the workspace is provided as a ``sycl::buffer``, the compute +calls must also use ``sycl::buffer`` for their arguments. Likewise, a USM +allocated workspace must only be used with USM compute calls. +Failing to do this will result in an invalid descriptor for compute calls. + +If the workspace provided is a USM allocation shared amongst multiple kernels, +it must be ensured that the kernels sharing this workspace do not use it simultaneously. + +This function can be called on committed descriptors where the workspace placement +is not ``config_value::WORKSPACE_EXTERNAL``. The provided workspace may or may not +be used in compute calls. However, the aforementioned restrictions will still apply. + +Calling this function multiple times on a committed descriptor is undefined behavior. + +.. rubric:: Syntax (buffer workspace) + +.. code-block:: cpp + + namespace oneapi::mkl::dft { + + template + void descriptor::set_workspace(sycl::buffer &workspaceBuf); + } + +.. rubric:: Syntax (USM workspace) + +.. code-block:: cpp + + namespace oneapi::mkl::dft { + + template + void descriptor::set_workspace(scalar_type* workspaceUSM); + + } + +.. container:: section + + .. rubric:: Input Parameters + + workspaceBuf + A workspace buffer where ``scalar_type`` is the floating point type according to ``prec``. This buffer must be sufficiently large or an exception will be thrown. + + workspaceUSM + A workspace USM allocation where ``scalar_type`` is the floating point type according to ``prec``. This allocation must be accessible on the device on which the descriptor is committed. It is assumed that this USM allocation is sufficiently large. The pointer is expected to be aligned to ``scalar_type``. + +.. container:: section + + .. rubric:: Throws + + The `descriptor::set_workspace()` routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here: + + :ref:`oneapi::mkl::invalid_argument()` + If the provided buffer ``workspaceBuf`` is not sufficiently large, or if the provided USM allocation ``workspaceUSM`` is ``nullptr`` when an external workspace of size greater than zero is required. + + :ref:`oneapi::mkl::uninitialized()` + If ``set_workspace`` is called before the descriptor is committed. + + +**Descriptor class member table:** :ref:`onemkl_dft_descriptor_member_table` .. _onemkl_dft_descriptor_commit: diff --git a/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst b/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst index fb597d3419..fac6d814f3 100644 --- a/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst +++ b/source/elements/oneMKL/source/domains/dft/enums_and_config_params.rst @@ -124,6 +124,8 @@ config_param BWD_DISTANCE, WORKSPACE, + WORKSPACE_PLACEMENT, + WORKSPACE_EXTERNAL_BYTES_REQUIRED, ORDERING, TRANSPOSE, PACKED_FORMAT, @@ -169,6 +171,10 @@ config_param - If computing multiple(batched) transforms, this parameter specifies the distance (in elements) between the first data elements of consecutive data sets in the backward domain. Provided in type ``std::int64_t``, the default value is 1. * - WORKSPACE - Some FFT algorithm computation steps require a scratch space for permutations or other purposes. To manage the use of auxiliary storage, set to ``config_value::ALLOW`` to permit the use of auxiliary storage and ``config_value::AVOID`` to avoid using auxiliary storage if possible. + * - :ref:`WORKSPACE_PLACEMENT` + - Some FFT algorithm computation steps require a scratch space for permutations or other purposes. For this auxiliary storage to be handled automatically, set to ``config_value::WORKSPACE_AUTOMATIC``. To provide a workspace manually, set to ``config_value::WORKSPACE_EXTERNAL``. The default value is ``WORKSPACE_AUTOMATIC``. + * - WORKSPACE_EXTERNAL_BYTES_REQUIRED + - The required external workspace size in bytes when ``WORKSPACE_PLACEMENT`` is set to :ref:`config_value::WORKSPACE_EXTERNAL`. A read-only value of type ``std::int64_t``. * - ORDERING - Some FFT algorithms apply an explicit permutation stage that can be time consuming. The value of ``config_value::ORDERED`` (default) applies the data ordering for all transformations. The value of ``config_value::BACKWARD_SCRAMBLED`` applies ordering for forward transform, but allows backward transform to have scrambled data if it gives a performance advantage. * - TRANSPOSE @@ -215,6 +221,10 @@ These are some of the non-integer/floating-point values that the :ref:`onemkl_df AVOID, NONE, + // For config_param::WORKSPACE_PLACEMENT + WORKSPACE_AUTOMATIC, + WORKSPACE_EXTERNAL, + // for config_param::PACKED_FORMAT for storing conjugate-even finite sequence in real containers CCE_FORMAT @@ -232,4 +242,5 @@ These are some of the non-integer/floating-point values that the :ref:`onemkl_df config_params/storage_formats config_params/strides config_params/distance + config_params/workspace_placement