diff --git a/source/elements/oneDNN/source/data_model/data_types/#int8.rst# b/source/elements/oneDNN/source/data_model/data_types/#int8.rst# deleted file mode 100644 index fefc1eedc5..0000000000 --- a/source/elements/oneDNN/source/data_model/data_types/#int8.rst# +++ /dev/null @@ -1,206 +0,0 @@ -.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation -.. -.. SPDX-License-Identifier: CC-BY-4.0 - -.. default-domain:: cpp - -#### -Int8 -#### - -To push higher performance during inference computations, recent work has -focused on computations that use activations and weights stored at lower -precision to achieve higher throughput. Int8 computations offer improved -performance over higher-precision types because they enable packing more -computations into a single instruction, at the cost of reduced (but -acceptable) accuracy. - -******** -Workflow -******** - -oneDNN support symmetric and asymmetric quantization models. - -.. _int8-quantization-label: - -Quantization Model -================== - -For each int8 tensor, the oneDNN library allows to specify scaling -factors and zero-points (also referred to as quantization -parameters), and assumes the following mathematical relationship: - -.. math:: - - x_{f32}[:] = scale_{x} \cdot (x_{int8}[:] - zp_{x}) - -where :math:`scale_{x}` is a *scaling factor* in float format, -:math:`zp_{x}` is the zero point in int32 format, and -:math:`[:]` is used to denote elementwise application of the formula -to the arrays. In order to provide best performance, oneDNN does not -compute those scaling factors and zero-points as part of primitive -computation. Those should be provided by the user through the -:ref:`attribute mecanism`. - -These quantization parameters can either be computed ahead of time -using calibration tools (*static* quantization) or at runtime based on -the actual minimum and maximum values of a tensor (*dynamic* -quantization). Either method can be used in conjunction with oneDNN, as -the quantization parameters are passed to the oneDNN primitives at -execution time. - -To support int8 quantization, primitives should be created and -executed as follow: - -- during primitive creation, if one or multiple inputs are int8 - (signed or not), then the primitive will behave as a quantized - integer operation. -- still during primitive creation, the dimensionality of the scaling - factors and zero-point should be provided using masks (e.g. one - scale per tensor, one scale per channel, ...). -- finally, during primitive execution, the user must provide the - actual quantization parameters as arguments to the execute function. - Scales shall be f32 values, and zero-points shall be int32 values. - -.. note:: - - For performance reasons, each primitive implementation can support - only a subset of quantization parameter masks. For example, - convolution typically supports per-channel scales (no zero-point) - for weights, and per-tensor scaling factor and zero-point for - activation. - -.. note:: - - Some primitives might use quantization parameters in order to - dequantize/quantize intermediate values. This is for example the - case for the :ref:`rnn-label` primitive, which will dequantize - before applying non linear functions, and will requantize before - executing matrix multiplication operations. - - -Numerical behavior -__________________ - -Primitive implementations are allowed to convert int8 inputs to wider -datatypes (e.g. int16 or int32), as those conversions do not impact -accuracy. - -During execution, primitives should avoid integer overflows and -maintain integer accuracy by using wider datatypes (e.g. int32) for -intermediate values and accumulators. Those are then converted as -necessary before the result is written to the output memory objects. -During that conversion, the behavior in case of underflow/overflow is -undefined (e.g. when converting `s32` to int8). However, it is highly -encouraged for implementations to saturate values. - -When multiple operations are fused in a single primitive using the -:ref:`post-op mecanism`, those are assumed to be -computed in f32 precision. As a result the destination quantization -parameters are applied after the post-ops as follow: - -.. math:: - - \dst[:] = post\_ops(OP(src[:], weights[:], ...)) / scale_{\dst} + zp_{\dst} - -Quantizing/dequantizing values between post-operations can still be -achieved using one of :ref:`eltwise post-ops`, -:ref:`binary post-ops`, or the scale parameter -of the appropriate post-operation. - - -Example: Convolution Quantization Workflow ------------------------------------------- - -Consider a convolution without bias. The tensors are represented as: - -- :math:`\src_{f32}[:] = scale_{\src} \cdot (\src_{int8}[:] - zp_{\src})` -- :math:`\weights_{f32}[:] = scale_{\weights} \cdot \weights_{int8}[:]` -- :math:`\dst_{f32}[:] = scale_{\dst} \cdot (\dst_{int8}[:] - zp_{\dst})` - -Here the :math:`\src_{f32}, \weights_{f32}, \dst_{f32}` are not -computed at all, the whole work happens with int8 tensors.So the task -is to compute the :math:`\dst_{int8}` tensor, using the `\src_{int8}`, -`\weights_{int8}` tensors passed at execution time, as well as the -corresponding quantization parameters `scale_{\src}, scale_{\weights}, -scale_{\dst}` and `zero_point{\src}, -zero_point_{\dst}`. Mathematically, the computations are: - -.. math:: - - \dst_{int8}[:] = - \operatorname{f32\_to\_int8}( - scale_{\src} \cdot scale_{\weights} \cdot - \operatorname{s32\_to\_f32}(conv_{s32}(\src_{int8}, \weights_{int8}) - - zp_{\src} \cdot comp_{s32}) / scale_{\dst} - + zp_{\dst} ) - -where - -- :math:`conv_{s32}` is just a regular convolution which takes source and - weights with int8 data type and compute the result in int32 data type (int32 - is chosen to avoid overflows during the computations); - -- :math:`comp_{s32}` is a compensation term to account for - `\src` non-zero zero point. This term is computed by the oneDNN - library and can typically be pre-computed ahead of time, for example - during weights reorder. - -- :math:`\operatorname{f32\_to\_s8}()` converts an `f32` value to `s8` with - potential saturation if the values are out of the range of the int8 data - type. - -- :math:`\operatorname{s32\_to\_f32}()` converts an `int8` value to - `f32` with potential rounding. This conversion is typically - necessary to apply `f32` scaling factors. - - -Per-Channel Scaling -------------------- - -Primitives may have limited support of multiple scales for a quantized tensor. -The most popular use case is the :ref:`convolution-label` primitives that -support per-output-channel scaling factors for the weights, meaning that the -actual convolution computations would need to scale different output channels -differently. - -- :math:`\src_{f32}(n, ic, ih, iw) = scale_{\src} \cdot \src_{int8}(n, ic, ih, iw)` - -- :math:`\weights_{f32}(oc, ic, kh, kw) = scale_{\weights}(oc) \cdot \weights_{int8}(oc, ic, kh, kw)` - -- :math:`\dst_{f32}(n, oc, oh, ow) = scale_{\dst} \cdot \dst_{int8}(n, oc, oh, ow)` - -Note that now the weights' scaling factor depends on :math:`oc`. - -To compute the :math:`\dst_{int8}` we need to perform the following: - -.. math:: - - \dst_{int8}(n, oc, oh, ow) = - \operatorname{f32\_to\_int8}( - \frac{scale_{\src} \cdot scale_{\weights}(oc)}{scale_{\dst}} \cdot - conv_{s32}(\src_{int8}, \weights_{int8})|_{(n, oc, oh, ow)} - ). - -The user is responsible for preparing quantized weights accordingly. To do that, -oneDNN provides reorders that can perform per-channel scaling: - -.. math:: - - \weights_{int8}(oc, ic, kh, kw) = - \operatorname{f32\_to\_int8}( - \weights_{f32}(oc, ic, kh, kw) / scale_{weights}(oc) - ). - -The :ref:`attributes-quantization-label` describes what kind of quantization -model oneDNN supports. - -******* -Support -******* - -oneDNN supports int8 computations for inference by allowing to specify that -primitive input and output memory objects use int8 data types. - - -.. vim: ts=3 sw=3 et spell spelllang=en diff --git a/source/elements/oneDNN/source/data_model/data_types/.#int8.rst b/source/elements/oneDNN/source/data_model/data_types/.#int8.rst deleted file mode 120000 index a2ef8f87d4..0000000000 --- a/source/elements/oneDNN/source/data_model/data_types/.#int8.rst +++ /dev/null @@ -1 +0,0 @@ -rscohn1@anpfclxlin02.462074:1674669906 \ No newline at end of file diff --git a/source/elements/oneDNN/source/data_model/data_types/int8.rst b/source/elements/oneDNN/source/data_model/data_types/int8.rst index 09773bb12b..fefc1eedc5 100644 --- a/source/elements/oneDNN/source/data_model/data_types/int8.rst +++ b/source/elements/oneDNN/source/data_model/data_types/int8.rst @@ -45,7 +45,7 @@ computation. Those should be provided by the user through the These quantization parameters can either be computed ahead of time using calibration tools (*static* quantization) or at runtime based on the actual minimum and maximum values of a tensor (*dynamic* -quantization). Either method can be used in conjuction with oneDNN, as +quantization). Either method can be used in conjunction with oneDNN, as the quantization parameters are passed to the oneDNN primitives at execution time. diff --git a/source/elements/oneMKL/source/domains/dft/config_params/data_layouts.rst b/source/elements/oneMKL/source/domains/dft/config_params/data_layouts.rst index b1b43f6851..17fa4343d8 100644 --- a/source/elements/oneMKL/source/domains/dft/config_params/data_layouts.rst +++ b/source/elements/oneMKL/source/domains/dft/config_params/data_layouts.rst @@ -77,7 +77,7 @@ its configuration value for ``config_param::COMPLEX_STORAGE`` (first 2 columns). :class: longtable * - Object type - - Configuration value for configuration paramer ``config_param::COMPLEX_STORAGE`` + - Configuration value for configuration parameter ``config_param::COMPLEX_STORAGE`` - Implicitly-assumed elementary data type in forward domain - Implicitly-assumed elementary data type in backward domain * - ``descriptor`` diff --git a/source/elements/oneMKL/source/domains/dft/descriptor.rst b/source/elements/oneMKL/source/domains/dft/descriptor.rst index 57f6b10051..7ea6a02537 100644 --- a/source/elements/oneMKL/source/domains/dft/descriptor.rst +++ b/source/elements/oneMKL/source/domains/dft/descriptor.rst @@ -277,7 +277,7 @@ The copy assignment operator results in a deep copy. .. rubric:: Throws - The assignment opererators shall throw the following exceptions if the + The assignment operators shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here: @@ -325,7 +325,7 @@ type like ``std::int64_t`` or ``float`` (more details available param One of the possible values of type :ref:`onemkl_dft_enum_config_param` - representing the (writable) configuraton parameter to be set. + representing the (writable) configuration parameter to be set. ... An element of the appropriate type for the configuration value @@ -395,7 +395,7 @@ type ``oneapi::mkl::dft::domain``, ``oneapi::mkl::dft::precision``, param One of the possible values of type :ref:`onemkl_dft_enum_config_param` - representing the configuraton parameter being queried. + representing the configuration parameter being queried. ... A pointer to a writable element of the appropriate type for the diff --git a/source/elements/oneMKL/source/domains/dft/dft.rst b/source/elements/oneMKL/source/domains/dft/dft.rst index efd1b3ad0d..aafbe1e248 100644 --- a/source/elements/oneMKL/source/domains/dft/dft.rst +++ b/source/elements/oneMKL/source/domains/dft/dft.rst @@ -56,7 +56,7 @@ forward domains: Similarly, we refer to DFTs of complex (resp. real) forward domain as "complex DFTs" (resp. "real DFTs"). Regardless of the type of forward domain, the -backward domain's data sequences are alway complex. +backward domain's data sequences are always complex. The calculation of the same DFT for several, *i.e.*, :math:`M > 1`, data sets of the same type of forward domain, using the same precision is referred to as a @@ -151,7 +151,7 @@ relevant configuration setting (*e.g.*, the number :math:`M` of sequences to consider in case of a batched DFT). Once configured as desired, the :ref:`onemkl_dft_descriptor_commit` member function of ``desc``, requiring a ``sycl::queue`` object ``Q``, may be invoked. The successful completion of the -latter makes ``desc`` comitted to the desired (batched) DFT *as configured*, for +latter makes ``desc`` committed to the desired (batched) DFT *as configured*, for the particular device and context encapsulated by ``Q``. The :ref:`onemkl_dft_compute_forward` (resp. :ref:`onemkl_dft_compute_backward`) function may then be called and provided with ``desc`` to enqueue operations @@ -166,7 +166,7 @@ user-provided, device-accessible data. - must be re-committed to account for any change in configuration after it was already successfully committed; - deliver best performance for DFT calculations when created, configured and - comitted outside applications' hotpath(s) that use them multiple times for + committed outside applications' hotpath(s) that use them multiple times for identically-configured (batched) DFTs. :ref:`onemkl_dft_compute_forward` and/or :ref:`onemkl_dft_compute_backward` should be the only oneMKL DFT-related routines invoked in programs' hotpaths.