error when run train.py #4

rndn123 · 2020-07-29T11:30:23Z

I got this error in your GitHub code please help thanks in advance
Traceback (most recent call last):
File "train.py", line 305, in
sys.exit(main(sys.argv[1:]))
File "train.py", line 300, in main
train(opt, shared, m, optim, ema, train_data, val_data)
File "train.py", line 178, in train
train_perf, extra_train_perf, loss, num_ex = train_epoch(opt, shared, m, optim, ema, train_data, i, train_idx)
File "train.py", line 113, in train_epoch
output = m.forward(wv_idx1, wv_idx2, cv_idx1, cv_idx2)
File "/content/drive/My Drive/layer_augmentation-master/layer_augmentation-master/pipeline.py", line 104, in forward
att1, att2 = self.attention(input_enc1, input_enc2)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "./attention/local_attention.py", line 36, in forward
self.shared.att_soft1, self.shared.att_soft2)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/layer_augmentation-master/layer_augmentation-master/within_layer.py", line 86, in forward
datt1_ls.append(layer(att1.transpose(1,2)).transpose(1,2).contiguous().view(1, batch_l, sent_l1, sent_l2))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "./constraint/n1.py", line 57, in forward
d = self.logic(att)
File "./constraint/n1.py", line 37, in logic
p_content_selector = get_p_selector(self.opt, self.shared, 'content_word', with_nul=False).view(self.shared.batch_l, 1, self.shared.sent_l1)
File "./constraint/constr_utils.py", line 58, in get_p_selector
mask[ex][p_contents] = 1.0
IndexError: index 15 is out of bounds for dimension 0 with size 14

t-li · 2020-08-05T19:42:06Z

Hey thanks for the feedback. This is weird though. The training script should run smoothly. Let's debug this.

It looks like the preprocessed content word mask is off-aligned with the actual example in the batch.
In the get_p_selector function, maybe you can print out these:

shared.batch_ex_idx[ex] which will give you the actual example line number to look up in the tokenized premise-hypothesis files;
p_contents which is the content work mask.

See if they are indeed off-aligned. If they do, my guess is it's likely from the preprocessing process.

Lastdier · 2020-08-09T12:27:52Z

Hi @t-li , I got the same error as @92komal did.

I printed out the 'p_contents' when the error occur. It is [2,5,8,9,10,11,13,15]. And its shared.batch_ex_idx[ex]=115750.
Then, I checked "train.content_word.json" and found out that index 466562, 466563, and 466564 have p=[2,5,8,9,10,11,13,15].
Therefore, I suspect the bug should be in either 'preprocess.py' or 'train.content_word.json'.
I've check preprocess.py, but couldn't find any.
'train.content_word.json' is unpackaged from 'conceptnet_rel.zip' in your repo.
Could you please add some descriptions about how it is generated or release your code?

t-li · 2020-08-12T20:09:09Z

Hey @Lastdier and @92komal, I will get to the experiment and start it from scratch and see what happens.

The code for fetching ConceptNet edges is already in the conceptnet.py file. In the description, I blurred out this phase because it involves tons of dirty hacks to make a ConceptNet instance to run on a particular machine setup at the time I was using it (https://www.cs.utah.edu/~tli/posts/2018/09/blog-post-3/). Considering ConceptNet is also evolving, I instead directly release those extracted edges in the json file.

But again, let me get to it and see what happens.

t-li · 2020-08-17T21:38:40Z

Hi, @Lastdier @92komal, I can almost confirm that it is due to the evolved Spacy tokenization function which now produces results do not align with the tokens in the constraint json files.

Luckily we backed up those tokenized files. They are now in the ./data/snli_1.0/snli_extracted.zip file. I just trained one epoch with them, and it ran smoothly.

t-li · 2020-08-17T22:11:31Z

@Lastdier BTW, I just added the extraction script for ConceptNet to the readme file. FYI.

Lastdier · 2020-08-18T07:34:18Z

@t-li The problem has been solved. Thank you!
BTW, I would be excellent that you could release your code on Machine Comprehension and Text Chunking.

t-li · 2020-08-18T19:02:53Z

@Lastdier Cool!

The code for QA is already there (https://github.com/utahnlp/layer_augmentation_qa). I put it in a separate repo since the code structures are very different.

Lastdier · 2020-08-19T12:28:50Z

@t-li Excellent! Thank you again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error when run train.py #4

error when run train.py #4

rndn123 commented Jul 29, 2020

t-li commented Aug 5, 2020

Lastdier commented Aug 9, 2020

t-li commented Aug 12, 2020

t-li commented Aug 17, 2020

t-li commented Aug 17, 2020

Lastdier commented Aug 18, 2020

t-li commented Aug 18, 2020 •

edited

Loading

Lastdier commented Aug 19, 2020

error when run train.py #4

error when run train.py #4

Comments

rndn123 commented Jul 29, 2020

t-li commented Aug 5, 2020

Lastdier commented Aug 9, 2020

t-li commented Aug 12, 2020

t-li commented Aug 17, 2020

t-li commented Aug 17, 2020

Lastdier commented Aug 18, 2020

t-li commented Aug 18, 2020 • edited Loading

Lastdier commented Aug 19, 2020

t-li commented Aug 18, 2020 •

edited

Loading