This is a re-implementation of Word2Vec relying on Tensorflow Estimators and Datasets.
Works with python >= 3.6 and Tensorflow v2.0.
via pip:
pip3 install tf-word2vec
or, after a git clone:
python3 setup.py install
You can download a sample of the English Wikipedia here:
wget http://129.194.21.122/~kabbach/enwiki.20190120.sample10.0.balanced.txt.7z
w2v train \
--data /absolute/path/to/enwiki.20190120.sample10.0.balanced.txt \
--outputdir /absolute/path/to/word2vec/models \
--alpha 0.025 \
--neg 5 \
--window 2 \
--epochs 5 \
--size 300 \
--min-count 50 \
--sample 1e-5 \
--train-mode skipgram \
--t-num-threads 20 \
--p-num-threads 25 \
--keep-checkpoint-max 3 \
--batch 1 \
--shuffling-buffer-size 10000 \
--save-summary-steps 10000 \
--save-checkpoints-steps 100000 \
--log-step-count-steps 10000