Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i skip all these conll files and make a simple prediction with an input text file #3

Open
aymen-souid opened this issue Nov 22, 2021 · 4 comments

Comments

@aymen-souid
Copy link

aymen-souid commented Nov 22, 2021

here's my problem with conll_eval_path , lm_path,conll_test_path, files that I don't need to make prediction

(aracoref) souid@aymen:~/aracoref$ python evaluate.py arabic_cleaned_bert
Running experiment: arabic_cleaned_bert
max_top_antecedents = 50
max_training_sentences = 50
top_span_ratio = 0.4
filter_widths = [
3
4
5
]
filter_size = 50
char_embedding_size = 8
char_vocab_path = "/home/souid/aracoref/bert-base-arabertv01/vocab.txt"
context_embeddings {
path = "/home/souid/aracoref/cc.ar.300.vec"
size = 300
}
head_embeddings {
path = "/home/souid/aracoref/cc.ar.300.vec"
size = 300
}
contextualization_size = 200
contextualization_layers = 3
ffnn_size = 150
ffnn_depth = 2
feature_size = 20
max_span_width = 30
use_metadata = true
use_features = true
model_heads = true
coref_depth = 2
lm_layers = 4
lm_size = 768
coarse_to_fine = true
max_gradient_norm = 5.0
lstm_dropout_rate = 0.4
lexical_dropout_rate = 0.5
dropout_rate = 0.2
optimizer = "adam"
learning_rate = 0.001
decay_rate = 0.999
decay_frequency = 100
train_path = "/home/souid/aracoref/best_models_crac2020/best-models/data/train.arabic.pred.mentions.jsonlines"
eval_path = "/home/souid/aracoref/best_models_crac2020/best-models/data/dev.arabic.pred.mentions.jsonlines"
conll_eval_path = "dev.arabic.v4_gold_conll"
lm_path = "bert_arb_conll12_cleaned_features.hdf5"
test_path = "/home/souid/aracoref/best_models_crac2020/best-models/data/test.arabic.pred.mentions.jsonlines"
conll_test_path = "test.arabic.v4_gold_conll"
genres = [
"bc"
"bn"
"mz"
"nw"
"pt"
"tc"
"wb"
]
eval_frequency = 500
report_frequency = 100
log_root = "logs"
max_step = 400000
use_joint_coref = true
use_e2e_annealing = false
log_dir = "logs/arabic_cleaned_bert"
Loading word embeddings from /home/souid/aracoref/cc.ar.300.vec...

Traceback (most recent call last):
File "evaluate.py", line 18, in
model = cm.CorefModel(config)
File "/home/souid/aracoref/coref_model.py", line 24, in init
self.context_embeddings = util.EmbeddingDictionary(config["context_embeddings"])
File "/home/souid/aracoref/util.py", line 174, in init
self._embeddings = self.load_embedding_dict(self._path)
File "/home/souid/aracoref/util.py", line 194, in load_embedding_dict
assert len(embedding) == self.size
AssertionError

@juntaoy
Copy link
Owner

juntaoy commented Nov 22, 2021

If you set official_stdout=False it will not try to create CoNLL files but you will need to write your own output parts in order to output the predictions.

@aymen-souid
Copy link
Author

even setting it to false, this error keeps showing, there is a problem with the length of the embedding
len(embedding)=1
len(self.size)=300

@juntaoy
Copy link
Owner

juntaoy commented Nov 22, 2021

I thought you asked about how to skip conll files? The error is not related to that, I think for this error you need to remove the first line for the cc.ar.300.vec as the first line of fasttext has only two numbers (the number of words and the dimension) The reader didn't try to skip this.

@aymen-souid
Copy link
Author

aymen-souid commented Nov 23, 2021

do you have an idea how to solve this after loading word embeddings ?
i can't find this file bert_arb_conll12_features.hdf5

Screenshot from 2021-11-23 14-24-25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants