Ocd #19

Chung-I · 2019-03-12T08:17:25Z

implement Optimal Completion Distillation.
add a new config named libri_ocd_example.yaml which enables ocd training.
Not well tested. Might have bugs inside.
temperature annealing not yet implemented. currently equals to 1e-8 (sharpest).

Liangtaiwan · 2019-04-29T04:44:54Z

@Alexander-H-Liu
I think this is a wonderful PR, can you merge it ASAP?

xingchensong · 2019-05-23T08:04:56Z

@Chung-I i notice that u used cross entropy in ocd_loss rather than KLdivergence( which is official in paper 'Optimal Completion Distillation for sequence learning') , is this PR a right implementation for ocd_loss? THX.

xingchensong · 2019-05-23T08:38:54Z

ocd_loss should be like this ?
optimal_probs = F.softmax(q_val / temp, dim=-1)

loss += ( optimal_probs * (torch.log(optimal_probs)- F.log_softmax(out_probs[b,:len_sample,:])) ).sum(dim=-1).mean()

Chung-I · 2019-05-25T03:20:06Z

Yes, as the paper indicated, the loss they used is KL divergence; however, when performing backprop in this scenario, the two losses are actually equivalent in terms of gradient calculation. Consider this:
KL(p||q) = ʃ p(x) log [p(x)/q(x)] dx = ʃ p(x) log p(x) dx - ʃ p(x) log q(x) dx = H(p, q) - H(p) .

So H(p, q) - KL(p||q) = H(p) .

H(p), while varying according to different number of targets and different temperature τ, doesn't contribute to gradients:
d KL(p||q) / d q = d [H(p, q) - H(p)] / d q = d H(p, q) / d q .

So the two losses are equivalent in backprop despite having different values.

But of course H(p, q) is not a divergence, since divergence requires D(p || q) = 0 if and only if p = q.
H(p, q) = H(p) > 0, while KL(p||q) = 0, when p=q.

It's true that if you really want to see how much q differs from p, KL divergence is the right loss to use. But after communicating with Alex (the owner of the repo), we decided to just ignore the H(p) term and use H(p, q) .

xingchensong · 2019-05-25T03:48:03Z

Yes, as the paper indicated, the loss they used is KL divergence; however, when performing backprop in this scenario, the two losses are actually equivalent in terms of gradient calculation. Consider this:
KL(p||q) = ʃ p(x) log [p(x)/q(x)] dx = ʃ p(x) log p(x) dx - ʃ p(x) log q(x) dx = H(p, q) - H(p) .

So H(p, q) - KL(p||q) = H(p) .

H(p), while varying according to different number of targets and different temperature τ, doesn't contribute to gradients:
d KL(p||q) / d q = d [H(p, q) - H(p)] / d q = d H(p, q) / d q .

So the two losses are equivalent in backprop despite having different values.

But of course H(p, q) is not a divergence, since divergence requires D(p || q) = 0 if and only if p = q.
H(p, q) = H(p) > 0, while KL(p||q) = 0, when p=q.

It's true that if you really want to see how much q differs from p, KL divergence is the right loss to use. But after communicating with Alex (the owner of the repo), we decided to just ignore the H(p) term and use H(p, q) .

i see ，THX for ur reply ! There is a question I would like to consult with you：Do we need to complete the backprob ourselves when designing a new loss? Recently I was trying to reproduce CTC（which used dynamic programming algorithms ) . Existing CTC repo such as baidu‘s warp-ctc not only realized the forward part, but also calculated the gradient by hand, but it seems we dont need to do so in ocd_loss , so i 'm confused，Should we calculate the gradient ourselves ?

Chung-I · 2019-05-27T08:39:37Z

I think PyTorch does automatic differentiation for you.

Baidu realized their own backward function because they want their own optimized version. (DeepSpeech2, Page 27)

Chung-I added 3 commits March 12, 2019 15:16

ocd draft

f0709fe

add minus sign in cross entropy

c93a5c1

fix bug that doesnt collect sampled_chars in validation

a07f184

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ocd #19

Ocd #19

Chung-I commented Mar 12, 2019

Liangtaiwan commented Apr 29, 2019

xingchensong commented May 23, 2019 •

edited

Loading

xingchensong commented May 23, 2019 •

edited

Loading

Chung-I commented May 25, 2019

xingchensong commented May 25, 2019

Chung-I commented May 27, 2019

Ocd #19

Are you sure you want to change the base?

Ocd #19

Conversation

Chung-I commented Mar 12, 2019

Liangtaiwan commented Apr 29, 2019

xingchensong commented May 23, 2019 • edited Loading

xingchensong commented May 23, 2019 • edited Loading

Chung-I commented May 25, 2019

xingchensong commented May 25, 2019

Chung-I commented May 27, 2019

xingchensong commented May 23, 2019 •

edited

Loading

xingchensong commented May 23, 2019 •

edited

Loading