-
Notifications
You must be signed in to change notification settings - Fork 32
2. Getting the Corpus
akoksal edited this page May 1, 2018
·
2 revisions
We need to have big corpus to train word2vec model. You can access all wikipedia articles written in Turkish language from wikimedia dumps. The available one is 20180101 for this day and you can download all articles until 01/01/2018 by this link, 20180101. Of course, you can use another corpus to train word2vec model but you must modify your corpus to train a model with gensim library, explained below.
Previous: 1. Prerequisites
Next: 3. Preprocessing the Corpus