Actually the repo contains:
- Dataset api: create pickle dataset, compare text using features, etc...
- If you don't want to create all the mean vectors for each paper, you can use: https://drive.google.com/drive/folders/1iSwv2t-mopjEVRnEKfzgFFpMi1XtHhUO?usp=sharing. Download processed.zip, unzip and put into repo/dataset/processed folder.
- Actually I train the word2vec using papers (maybe the preprocessing could be improve). If you want to use, download: https://drive.google.com/drive/u/0/folders/1iSwv2t-mopjEVRnEKfzgFFpMi1XtHhUO, unzip and put into models/word2vec folder.