From 1bbe9e678e06b329d172f64d691273aca63e2325 Mon Sep 17 00:00:00 2001 From: Paul Balanca Date: Thu, 15 Aug 2024 09:35:24 +0100 Subject: [PATCH] wip --- docs/JAX FP8 matmul tutorial.ipynb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/JAX FP8 matmul tutorial.ipynb b/docs/JAX FP8 matmul tutorial.ipynb index 93e71a9..48616ce 100644 --- a/docs/JAX FP8 matmul tutorial.ipynb +++ b/docs/JAX FP8 matmul tutorial.ipynb @@ -15,7 +15,9 @@ "source": [ "## Quickstart: FP8 in deep learning\n", "\n", - "The latest generation of machine learning hardware (Nvidia H100, AMD MI300, Graphcore C600, ...) have integrated direct FP8 support in the hardware, improving energy efficiency and throughput.\n", + "The latest generation of machine learning hardware (Nvidia H100, AMD MI300, Graphcore C600, ... TODO links) have integrated direct FP8 support in the hardware, improving energy efficiency and throughput.\n", + "\n", + "As shown the low precision ML literature, two distinct formats are necessary to support to achieve similar accuracy to `bfloat16` (or `float16`) training: `E4M3` and `E5M2` `float8` formats. As presented below, the two formats differ in the trade-off between precision (i.e. mantissa bits) and dynamic range (i.e. exponent bits). In short, `E4M3` is used for storing weights and activations whereas `E5M2` for representing backward gradients (which require a higher dynamic range).\n", "\n", "![image](img/fp-formats.webp)\n", "\n",