Skip to content

[Question] What is the memory footprint of te.Linear() weights ? #239

Answered by ptrendx
vince62s asked this question in Q&A
Discussion options

You must be logged in to vote

Currently the FP8 weights are only internal and so the actual model weights take the same amount of memory as without FP8 execution (e.g. 2B for FP16+FP8 training). We are working together with Meta on exposing FP8 tensors in pyTorch, which will enable storing only the FP8 weights, resulting in memory savings over the base model as well as e.g. faster communication in FSDP, but it is currently in the PoC stage.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ksivaman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #222 on May 22, 2023 19:04.