This project's goal is to support researchers in their analysis. Each researcher must quickly scan through numerous study articles in an effort to discover the ones that are relevant to the subject at hand. In order to select the applicable papers, they must read the abstracts of each one. Yet occasionally, if the abstract lacks a sufficient framework, it becomes time-consuming. This abstract-simplifier helps in making reading easier, quicker & efficient and contribute in NLP research.
the authors of the paper have made the data they used for their research availably publically and for free in the form of .txt files on GitHub.
- PubMed_20K_RCT which contains 20k labelled sentences of abstracts in total. There is also a version of this dataset where the numbers mentioned in the abstract is
replaced by @ symbol. - PubMed_200k_RCT which contains 200k labelled sentences of abstracts in total. There is also a version of this dataset where the numbers mentioned in the abstract is replaced by @ symbol.
- the Smaller version (PubMed_20K_RCT) has been used for this project due to limitation on excess of Google colab GPU.
- Downloading Dataset.
- Visualise and preprocess a data.
- Experimenting with different model
- Creating final model using transfer learning
- Compararing all model results
- Save and load best performing model
- Evaluate model on test data
- Find most wrong prediction