Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I reproduce the result for SQuAD 1.1? #10

Open
alphaf52 opened this issue Oct 25, 2019 · 6 comments
Open

How could I reproduce the result for SQuAD 1.1? #10

alphaf52 opened this issue Oct 25, 2019 · 6 comments

Comments

@alphaf52
Copy link

Hi,

Thanks for your good work. I would like to reproduce the result for SQuAD 1.1 (as shown in Table 1 in the paper), but I am having some troubles.

First, I downloaded the Pretrained Model from "gs://denspi/v1-0/model" and then tried to eval on dev-v1.1 using: "python run_piqa.py --do_predict --output_dir tmp --do_load --load_dir model --predict_file dev-v1.1.json --do_eval --gt_file dev-v1.1.json --metadata_dir bert"

The predicted answer seems to be random span, resulting in a metric like: {"exact_match": 0.47303689687795647, "f1": 4.43806570152543}. 0.47% EM means something is totally wrong.

I wonder whether I did it correctly.

And if I want to train a model to reproduce the result by myself, since I cannot get the Pretrained Model work, is it enough to just run the first step in the training section (i.e. "python run_piqa.py --train_batch_size 12 --do_train --freeze_word_emb --save_dir $SAVE1_DIR")

Thanks and hope to get your advice

@mittalpatel
Copy link

Hey @alphaf52 , could you find any solution for this? We are still facing the same issue.

@jhyuklee
Copy link
Collaborator

Hi, I think the problem is you forgot to give --parallel. The model is trained on DataParallel, so you have to give that option to load the model properly. Please try this and let me know.

@mittalpatel
Copy link

Thanks a lot @jhyuklee , this seems to be working!!! We provided --parallel while creating vectors and it is giving proper answers now. We are doing some further testing and will confirm of this soon.

Thanks once again for the hint. It really helped!

@alphaf52 You may try this solution.

@yucoian
Copy link

yucoian commented Dec 27, 2019

Our group can't reproduce the result for SQuAD 1.1 (as shown in Table 1 in the paper) from scratch either ! The README file does not give any interpretative statement on how to accomplish it.
Please help ... @mittalpatel @jhyuklee @eunsol @mbforbes

@mittalpatel
Copy link

@yucoian at what point are you facing the problem? We could do it by following the steps given in the readme.

@yucoian
Copy link

yucoian commented Dec 31, 2019

@mittalpatel Thank you very much! In the "SQuAD v1.1 Experiments (Section 6.1)", we cannot reimplement the "DENSPI (dense only, with Coherency scalar)" model. Could you please tell us how to adapt your released code to reproduce the result of "DENSPI (dense only, with Coherency scalar)"? To be specific, after adding coherency scalar into DENSPI,we cannot reproduce the result.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants