Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APFloat] Add APFloat support for E8M0 type #107127

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Sep 25, 2024

  1. [APFloat] Add APFloat support for E8M0 type

    This patch adds an APFloat type for unsigned E8M0 format.
    This format is used for representing the "scale-format"
    in the MX specification:
    https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
    
    This format does not support {Inf, denorms, zeroes}.
    Like FP32, this format's exponents are 8-bits (all bits here)
    and the bias value is 127. However, it differs from IEEE-FP32
    in that the minExponent is -127 (instead of -126).
    There are updates done in the APFloat utility functions
    to handle these constraints for this format.
    
    * The bias calculation is different and convertIEEE* APIs
      are updated to handle this.
    * Since there are no significand bits, the
      isSignificandAll{Zeroes/Ones} methods are updated accordingly.
    * Although the format does not have any precision, the precision
      bit in the fltSemantics is set to 1 for consistency with
      APFloat's internal representation.
    * Many utility functions are updated to handle the fact that this
      format does not support Zero.
    * Provide a separate initFromAPInt() implementation to
      handle the quirks of the format.
    * Add specific tests to verify the range of values for this format.
    
    Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
    durga4github committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    886ac70 View commit details
    Browse the repository at this point in the history