Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the compressed model using TT is slower than the non-compressed model? #222

Open
miladdona opened this issue Jun 6, 2022 · 2 comments

Comments

@miladdona
Copy link

I am trying to factorize the LeNet300 model (including only 3 FC layers (784x300), (300x100), (100x10)). I have factorized only the first layer with shape 784x300 using t3f.
After fine-tuning I have good results in tense of accuracy.
Also using this I compressed the mode from 266610 params to 49170 params (about 81% compression).
But results are not good when I tried to get execution time.
execution time for 10 times prediction over the test data (includes 10000 data images) is as follows:
baseline model (without factorization) = 5.51 s
factorized model = 5.57 s

factorization configuration is: 784x300 ----> [[2, 392], [20, 15]] and max_tt_rank = 3

while the FLOPs for the baseline model is: 532810 FLOPs
and for factorized model is: 116486 FLOPs (about 78% decrease FLOPs)
I should mention that I calculate the FLOPs for factorized layer using this link from you:
https://colab.research.google.com/drive/16S_SUbIjhnQBFj_r7sCpbwZNHADIzEwX?usp=sharing

Also to calculate FLOPs for non-factorized layer I use this correlation:
2 * (input_dim * output_dim) + outputdim

What is the problem that we decrease the number of FLOPs but get worse results than baseline?

@Bihaqo
Copy link
Owner

Bihaqo commented Jun 8, 2022

Hi,
By "worse results" you mean that it's good accuracy but bad inference speed, right? I would suggest trying a more balanced factorization, e.g. [[28, 28], [20, 15]]. Also, the bigger the initial layer, the bigger are the gains (both in terms of compression and speed), so you might want to start with a bigger network to get better improvements.

And finally, I wouldn't expect TT to be amazing in terms of reducing running time (unless applied to gigantic layers). GPUs (and TPUs) are so good at multiplying big matrices, that when you do something smart to speed it up, you usually reduce FLOPs by a lot, but don't reduce running time that much. Saving memory is usually much easier.

@miladdona
Copy link
Author

Yes, you are right. In general, the accuracy is acceptable and also we can improve it using fine-tuning. But inference time is not good, execution time is sometimes 2X more than baseline!!
I have tested for balanced configurations and no good results.
also I have tested on layers with shape 4096x4096 that is big layer, the result is better but still not acceptable!
Yes, you are right about GPUs and TPUs, but at the first place I am trying to run on CPUs.
thanks anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants