-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some questions about quantization and shift #15
Comments
Just follow the question above. Do you have any answers right now @Dawn-LX ? In addition, i have some new questions:
Hope to receive your answer @boluoweifenda . Thanks a lot anyway. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for your amazing work,. I have carefully read it but still have some questions.
(1) in section 3.3.2, I understand " batch normalization degenerates into to a scaling layer", becaues "we hypothesize that batch outputs of each hidden layer approximately have zero-mean".
BUT another thing is that, BN is (X-\mu) / \sigma, why the scaling factor \alpha can represent the variance ?
(2) in section 3.3.3, how do you obtain [-sqrt(2), sqrt(2)] ??
I can unsterstand the function of shift( · ) and the middle figure in Fig.2,
for example in fig2: max(abs(e)) ≈ 1e-4, and 1e-8/shift(1e-4) = 8.912e-5, and 1e-4/shift(1e-4) = 0.8192. Therefore the peak is shifted. but how do you get [-sqrt(2), sqrt(2)] ??
(3) in section 4.3 and figure 4 how do you calculate the lower bound 8bit and 16 bit ?
you said "Upper boundaries are the max{|e|}" so for both case it is same. but for lower bound ,
2^(1-8) = 0.0078125 and 2^(1-16) = 3.0517578125e-5.
both lower_bound = Q(e_min, k_w) and lower_bound = Q(e_min/shift(max(|e|)), k_w) are not make sense.
for example e_max ≈ 1e-3 and e_min ≈ 1e-8 in fig4, e_min < σ(16) so lower_bound = Q(e_min, k_w) is not correct, and, e_min/shift(max(|e|)) = 1.024e-5, which is also < \sigma(16)
and lower_bound = Q(e_min/shift(max(|e|)), k_w) is not make sense.
(4) in Eq.11 what if g_s is large and \Delta W exceed the range [-1+\sigma, 1-\sigma] , do you mean you solve this situation together in Eq.12 ?
I hope I can get your help , thank you again !
The text was updated successfully, but these errors were encountered: