# 大規模コーパスの作成
>>> python3.7 parse_PubMed.py --input-dir . --output-path out.txt
# sentencepieceの学習
>>> python3.7 sp_train.py --input-path ../../../PubMed/out.txt --vocab-size 16000
# fasttextで学習するために大規模コーパスをtokenizeする。
>>> python3.7 tokenize_for_pretrain.py --input-path ../../../PubMed/out.txt --sp-model ../sentencepieces/sp16000.model --output-path tokenized_out.txt
# fasttextで学習する。
>>> ~/fastText-0.1.0/fasttext skipgram -input ./tokenized_out.txt -output pretrain.model -dim 50 -epoch 10
-
Notifications
You must be signed in to change notification settings - Fork 1
hiroto0227/att-chemdner-pytorch
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published