An-Attentive-Neural-Model-for-labeling-Adverse-Drug-Reactions

This work focuses on extraction of Adverse Drug Reactions（ADRs）from ADRs-related tweets and sentences extracted from PubMed abstracts.

Our paper 《An Attentive Neural Sequence Labeling Model for Adverse Drug Reactions Mentions Extraction》 has been accepted by IEEE Access.

You can find more details (in Chinese) about our paper via my blog: Sequence labeling with embedding-level attention.

Requirments

Python 3.5.2
Keras (deep learning library, verified on 2.1.3)
NLTK (NLP tools, verified on 3.2.1)

Something you need to prepare before runing the code

Datasets

We use two datasets in our paper, the first one is a Twitter dataset, which is used in paper Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts, another dataset is called ADE-Corpus-V2, which is used in paper An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text and availe online: https://sites.google.com/site/adecorpus/home/document.

Because it is against Twitter's Terms of Service to publish the text of tweets, so we cannot provide the first Twitter dataset, you can obtain this dataset from the author of paper Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts so that you can keep your dataset consistent with that used in our paper.

Please get these two datasets ready and put them to twitter_adr/data and pubmed_adr/data, respectively. I have provided the PubMed dataset in pubmed_adr/data.

Word embedding

For both datasets, we use the pretrained GloVe 300d word embedding, please download it and put it to twitter_adr/embeddings.

Data processing

We have twitter_adr/data_processing.py and pubmed_adr/data_processing.py to process the two datasets, respectively.

Model

The twitter_adr/model.py and pubmed_adr/model.py are the model code to generate the predictions, and approximateMatch.py is the script which adopts approximate matching and prints the results of the model.

Results

This is the result obtained by our model. The F1 on the Twitter dataset is about 0.84, which is more than 10% of the previous SOTA (state-of-the-art), and the F1 on the PubMed dataset is about 0.91, which is more than the previous SOTA about 5%. Due to randomness of the result, it is recommended to run the model several times and average their results.

Because our model is essentially focusing on a sequence labeling task, it can be generalized to any token level classification tasks, such as Named Entity Recognition (NER), Part Of Speech tagging (POS taging). Next I want to validate our model on some larger datasets and explore the magical effects of pre-training.

Thanks to

I want to sincerely shout out to the following work:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An-Attentive-Neural-Model-for-labeling-Adverse-Drug-Reactions

Requirments

Something you need to prepare before runing the code

Datasets

Word embedding

Data processing

Model

Results

Thanks to

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
img		img
pubmed_adr		pubmed_adr
results		results
twitter_adr		twitter_adr
README.md		README.md
approximateMatch.py		approximateMatch.py

Deep1994/An-Attentive-Neural-Model-for-labeling-Adverse-Drug-Reactions

Folders and files

Latest commit

History

Repository files navigation

An-Attentive-Neural-Model-for-labeling-Adverse-Drug-Reactions

Requirments

Something you need to prepare before runing the code

Datasets

Word embedding

Data processing

Model

Results

Thanks to

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages