Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower-case in LM1B #7

Open
Tomarchelone opened this issue Dec 15, 2022 · 1 comment
Open

Lower-case in LM1B #7

Tomarchelone opened this issue Dec 15, 2022 · 1 comment

Comments

@Tomarchelone
Copy link

Hello!

In the paper you write

All text data are lower-cased to align with the settings of Austin et al. (2021)

But in D3PM paper it is never stated that LM1B data was lower-cased (and you can see samples from their model in the appendix where the sentences contain upper-case characters). So the perplexity comparison seems incorrect, because it is easier to model all-lowercased text. Am I missing something?

@Hzfinfdu
Copy link
Owner

Hzfinfdu commented Dec 15, 2022

Hi, thank you for your question! I have to admit that we made a mistake on that statement. We will remove this in our later versions.

Nevertheless, we think the comparison is fair. We re-implemented D3PM with PyTorch. Besides, we replaced their backbone with the architecture of bert-base-uncased and used the same tokenizer (so that both methods are lower-cased). We obtained the baseline results based on such re-implementation.

It is also worth noting that our reported results of D3PM-absorbing are only slightly worse than that in their paper due to limitation of computational resources, indicating the correctness of our implementation. But we trained DiffusionBert for even less time.

Hope this helps! Please feel free to contact with me if you have any other questions. We will also include the cased results in the final version. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants