Based on the cosine similarity between BERT-encoded journal scopes and a user-provided abstract
Explore the docs »
Report Bug
·
Request Feature
Table of Contents
The goal of this project is to build a journal recommender for submission of a scientific manuscript. The recommendations are based on similarities between the scope of the journals and the user-provided abstract of a manuscript. To achieve this, two steps have been taken:
1. scimagojr_scrape.ipynb script: I scraped scimagojr.com to extract the scope of each journal from it's dedicated webpage on scimagojr.com and stored these scopes in a separate dataset for each subject category: Biochemistry, Genetics and Molecular Biology / Immunology and Microbiology / Medicine / Neuroscience / Pharmacology, Toxicology, and Pharmaceutics.
The scraped scopes can be viewed in the scraped_from_scimago directory
2. journal_finder.ipynb script: I used a BERT model pretrained on MEDLINE/Pubmed texts which is available on TensorFlow Hub to convert the journal scopes and the user-provided abstract into feature vectors. Then I used cosine similarity values between these vectors to find the most similar scopes to the provided abstract.
The encodings of the journal_scopes using the TensorFlow Hub BERT expert (BERT pooled outputs) are available in the journal_scope_encodings directory as .pkl files.
3. finetuned_BERT_journal_recommender.ipynb script: I used a PubMedBERT sentence similarity model pretrained on MNLI, SNLI, SCINLI, SCITAIL, MEDNLI, and STSB texts which is available on Hugging Face and finetuned it with [abstract, journal_scope] pairs scraped from Pubmed. The finetuned model was then used to convert the journal scopes and the user-provided abstract into feature vectors, which were then compared using cosine similarity to find the most similar scopes to the provided abstract.
Abstracts scraped from PubMed can be downloaded using this link.
- Python v3.7.15
- TensorFlow v2.8.0
journal_finder.ipynb script: If you want to test the final functionality (journal recommendation system), import the .pkl dictionaries to the journal_finder.ipynb script and skip to the "Define function that computes similarity ..." section.
finetuned_BERT_journal_recommender.ipynb: You could skip the scraping section in this notebook as well. Upload the Pubmed data to your Drive or Colab and then skip to "Alternatively, use the presaved ..." section.
No installation is required. Just open the scripts in Google Colab and you're good to go.
Finding suitable journals for your manuscript can be a tedious and time-consuming process, especially if you're new to a field or your manuscript was rejected by your first- and second-choice journals. Having a powerful recommender can save up a lot of time in this regard.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the Apache-2.0 License. See LICENSE.txt
for more information.
Amin Sadeghi - masadeghi6@gmail.com
Project Link: https://github.com/masadeghi/journal_finder