- Using
Glove
algorithm to create word matrix for the titles of each paper - Then use
PCA
make every word matrix have the same size(3,50)
- X and Y sets are created by citation relationship. if paper A is cited by paper B, then the X here is the word matrix of paper A and the Y is the word matrix of paper B.
- Then we get X set with size
(5967, 3, 50)
and Y set with size(5967, 3, 50)
. - Finaly, we shuffle the dataset and split the dataset into train set and validation set.
- Use Keras LSTM to create a seq2seq model with input size
(batchsize, 3, 50) and the output size (batchsize, 3, 50)
- We use our obtained train set and validation set to train the model.
- Use the trained model to get a corresponding predict matrix for the word matrix of each paper, which have the size
(3,50)
.
- Use the newly obtained predict matrix to create distance matrix as we do last week.
- Select the 10 articles which has the most similar titles to paper
2554
- Translate these titles of the 10 articles and compare their real meaning with the title of
2554
- Find the articles which cited
2554
and articles which are cited by2554
- Compare these articles with the 10 articles