You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All text data are lower-cased to align with the settings of Austin et al. (2021)
But in D3PM paper it is never stated that LM1B data was lower-cased (and you can see samples from their model in the appendix where the sentences contain upper-case characters). So the perplexity comparison seems incorrect, because it is easier to model all-lowercased text. Am I missing something?
The text was updated successfully, but these errors were encountered:
Hi, thank you for your question! I have to admit that we made a mistake on that statement. We will remove this in our later versions.
Nevertheless, we think the comparison is fair. We re-implemented D3PM with PyTorch. Besides, we replaced their backbone with the architecture of bert-base-uncased and used the same tokenizer (so that both methods are lower-cased). We obtained the baseline results based on such re-implementation.
It is also worth noting that our reported results of D3PM-absorbing are only slightly worse than that in their paper due to limitation of computational resources, indicating the correctness of our implementation. But we trained DiffusionBert for even less time.
Hope this helps! Please feel free to contact with me if you have any other questions. We will also include the cased results in the final version. :)
Hello!
In the paper you write
But in D3PM paper it is never stated that LM1B data was lower-cased (and you can see samples from their model in the appendix where the sentences contain upper-case characters). So the perplexity comparison seems incorrect, because it is easier to model all-lowercased text. Am I missing something?
The text was updated successfully, but these errors were encountered: