We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word2Vec.trainJavaModel("data/train.txt", "data/test.model");
你好, data/train.txt 和 data/test.model 能给个样例吗。
例如:我有10句话,分词之后,在train.txt是什么样子的。 把相近的词空格分开,放到同一行? 还是10句话,一句一行,词用空格
The text was updated successfully, but these errors were encountered:
你好,data/test.model 是训练好之后保存的模型路径。data/train.txt 是分好词的训练语料,一行是一个文本,每个文本都是用空格分隔的词语,例如:
data/test.model
data/train.txt
doc1_word1 doc1_word2 doc1_word3... doc2_word1 doc2_word2 doc2_word3... ...
Sorry, something went wrong.
我更建议直接使用 Google 官方的代码来训练模型,是目前公认的准确率最高的 word2vec 版本,与使用 Java 版训练得到的模型格式是完全相同的,后面也可以使用本库加载。可以参见:
训练 Google 版模型 维基百科中文语料库词向量的训练:处理维基百科中文语料
No branches or pull requests
Word2Vec.trainJavaModel("data/train.txt", "data/test.model");
你好, data/train.txt 和 data/test.model 能给个样例吗。
例如:我有10句话,分词之后,在train.txt是什么样子的。
把相近的词空格分开,放到同一行? 还是10句话,一句一行,词用空格
The text was updated successfully, but these errors were encountered: