Skip to content

MedFact is a set of algorithms to compute a veracity score for text containing medical claims

License

Notifications You must be signed in to change notification settings

hwsamuel/MedFact

Repository files navigation

MedFact Objective Trust Metrics License: AGPL v3

MedFact is a set of algorithms that help assign a veracity score to text paragraphs about medical claims.

Please cite the following publication when using our source code for your research. This project is supported by the Alberta Machine Intelligence Institute (Amii).

@inproceedings{SamuelZaiane2018,
  title = {{MedFact: Towards Improving Veracity of Medical Information in Social Media using Applied Machine Learning}},
  author = {Samuel, Hamman and Zaiane, Osmar},
  booktitle = {{31st CAIAC Canadian Conference on Artificial Intelligence (CAI)}},
  pages = {108--120},
  year = {2018},
  organization = {{CAIAC}}
}

Prerequisites

  • This code is developed in Python 2.7.15 and tested on Anaconda
  • The related Python libraries for this project can be installed via pip install -r requirements.txt (file generated via pipreqs --savepath=requirements.txt .)
  • The datasets required in the datasets folder can be downloaded from GDrive.

Datasets

Workflow

  1. Train the medical phrases classifier by running train() in medclass.py which will generate and persist the trained model
  2. For a given incoming text paragraph, identify key phrases and medical phrases using predict() from medclass.py
  3. Use the incoming medical phrases to query the TRIP database with query() to get related articles. Optionally, also query Health Canada's knowledge base using query() in healthcanada.py
  4. Extract the corpus phrases from the TRIP (and optionally Health Canada) articles with extract() in article.py
  5. Train the accord/agreement classifier via train() in accordcnn.py
  6. Compare the incoming medical phrases with the corpus medical phrases via predict() in accordcnn.py
  7. Calculate the veracity score via veracity() in medfact.py
  8. Compute the confidence score via confidence() in medfact.py
  9. Compute the triage label via triage() in medfact.py
  10. Readability of the text being processed can be quantified with metrics() in readability.py

Bulk Mode

  • The veracity of websites can be computed via the batch mode which samples conteint on the given website's home page or other specified pages using a web scraper
  • An example is provided on using this mode in medfact.py as example2() (the RESTful API has a URL mode that provides bulk analysis)
  • The bulk mode is also useful when needing to analyse paragraphs of text which would contain multiple sentences

RESTful API

  • To run the Flask web app locally, use the command python medfact.py api
  • In your web browser, go to http://127.0.0.1:5000/api/text/?text= for processing a text sentence OR use the address http://127.0.0.1:5000/api/url/?url= for analyzing a website's page (full details on the API are documented in api_docs.docx)
  • The live MedFact API will be using IaaS hosting with Cybera
  • When using IaaS hosting, you can serve the Flask web app using uWSGI
  • PaaS hosting configurations depend on the provider, but here is one for Heroku

About

MedFact is a set of algorithms to compute a veracity score for text containing medical claims

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published