GitHub - CoronaWhy/basic-similarity-between-tasks-and-current-dataset: Initial ideas to compare using similarity between tasks and dataset

Actually the repo contains:

Dataset api: create pickle dataset, compare text using features, etc...
If you don't want to create all the mean vectors for each paper, you can use: https://drive.google.com/drive/folders/1iSwv2t-mopjEVRnEKfzgFFpMi1XtHhUO?usp=sharing. Download processed.zip, unzip and put into repo/dataset/processed folder.
Actually I train the word2vec using papers (maybe the preprocessing could be improve). If you want to use, download: https://drive.google.com/drive/u/0/folders/1iSwv2t-mopjEVRnEKfzgFFpMi1XtHhUO, unzip and put into models/word2vec folder.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset		dataset
notebooks		notebooks
train_extractors/word2vec		train_extractors/word2vec
visualization		visualization
.gitignore		.gitignore
1_dataset_creation.py		1_dataset_creation.py
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

Provide feedback