-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High GPU memory consumption #7
Comments
Hi! Could you please share information about your GPU, batch size. Did you try to compress only part of the layers and off compress mode for attention matrixes (see arguments)? How many GPU did you use? Also, I carried out some experiments with transformer-XL in January. At the end of August, I'll try to find my code. Hint: if you want to increase the compressed ratio, you can also try to compressed embedding (or protection matrixes ) in the same way. |
BTW I tried to profile the memory, most of it allocated when doing matmuls. |
It's strange, I'll see on the implementation of tt-layer one more time. Unfortunately, I didn't find the code for LM, but maybe you will be interested in work from NeurIps 2020 which have the same idea for LM. A Tensorized Transformer for Language Modeling |
I partially solved it - (reconstruct the matrix the do the operation) solution suggested in t3nsor repo. I implemented it to my custom code and it worked too. Btw Check the redit thread on it Without working code to prove - I don't believe anything they say. I even implemented this myself based on the paper and found more nonsense they did... |
Hi,
I tried to integrate the TTLinear layer into TransformerXL,
however I found that it consumes much more memory than usual.
Couldn't even train it.
Model before compression was 151M params, after compression was 124M params.
It even consumed much more memory in inference - 3021MB for the compressed model versus 2132MB for the normal model.
I also tried to write the "forward" method more efficiently (e.g with bmm) , it didn't help too.
Did you experience such problems? do you know anyway around this?
Thanks,
The text was updated successfully, but these errors were encountered: