Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About optional DEV_SET and TEST_SET #39

Open
zhao19991111 opened this issue Aug 29, 2020 · 0 comments
Open

About optional DEV_SET and TEST_SET #39

zhao19991111 opened this issue Aug 29, 2020 · 0 comments

Comments

@zhao19991111
Copy link

zhao19991111 commented Aug 29, 2020

I got this error after running the AutoNER without DEV_SET and TEST_SET:

Traceback (most recent call last):
  File "preprocess_partial_ner/encode_folder.py", line 281, in <module>
    testa_dataset = encode_dataset(args.input_testa, w_map, c_map, cl_map, tl_map)
  File "preprocess_partial_ner/encode_folder.py", line 221, in encode_dataset
    features, labels_chunk, labels_point, labels_typing = read_corpus(lines)
  File "preprocess_partial_ner/encode_folder.py", line 115, in read_corpus
    assert len(line) == 3, "the format of corpus"
AssertionError: the format of corpus

I noticed that the ./autoner_train.sh tries to use TRAINING_SET as DEV_SET and TEST_SET:

if [ DEV_SET == "" ]; then
    DEV_SET=$TRAINING_SET
fi
``
if [ TEST_SET == "" ]; then
    TEST_SET=$TRAINING_SET
fi

But somehow such replacement wouldn't happen during execution, so I manually replaced them.
It seems that TRAINING_SET (or annotation.ck) has one more column than the required format of DEV_SET/TEST_SET, does it mean such replacement is not valid and DEV_SET and TEST_SET are actually required?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant