Skip to content

Clarification for the precision part in fp8_primer #835

Answered by Andrew-Luo1
YangFei1990 asked this question in Q&A
Discussion options

You must be logged in to vote

Glad that it's of some help so far.

Re "both the input and linear weights are already fp8 representable": By "fp8 representable", to be clear, I mean you can cast it to fp8 then recast it to fp32 and get the same value, i.e. not lose any information from quantization. The input and linear weights that went into making out_fp32 do not appear to already be fp8 representable; they're randomly generated in the space of fp32's. Indeed, aggregating across a few cells, out_fp32 is calculated as:

my_linear = te.Linear(768, 768, bias=True) # I think the weights are fp32's.
inp = torch.rand((1024, 768)).cuda()
out_fp32 = my_linear(inp) # applies fp32 since not in fp8_autocast context .

Re "you als…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@YangFei1990
Comment options

@Andrew-Luo1
Comment options

Answer selected by YangFei1990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants