Clarification for the precision part in fp8_primer #835
-
Inside the fp8_primer it says
But however I do not understand the reason. Could someone help me understand what is the inputs representable in FP8 and how they are different compared with the inputs that are cast into FP8? And what is the exact reason that representable FP8 has smaller gap compared with FP32 and how that could help debugging? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Beta Was this translation helpful? Give feedback.
Glad that it's of some help so far.
Re "both the input and linear weights are already fp8 representable": By "fp8 representable", to be clear, I mean you can cast it to fp8 then recast it to fp32 and get the same value, i.e. not lose any information from quantization. The input and linear weights that went into making out_fp32 do not appear to already be fp8 representable; they're randomly generated in the space of fp32's. Indeed, aggregating across a few cells, out_fp32 is calculated as:
Re "you als…