Openpharma : ML for search bar and data categorization

The objective of openpharma is to provide a neutral home for open source software related to pharmaceutical industry that is not tied to one company or institution. http://openpharma.pharmaverse.org/

📨 For any questions, feel free to reach me out at the email adress : mathieu.cayssol@gmail.com

0. General overview

Global pipeline

You are in the front-end repository of openpharma. The global project include 3 repositories :

⚙️ Data crawler : https://github.com/openpharma/openpharma.github.io
🤖 ML for search bar and data categorization : https://github.com/openpharma/openpharma_ml
📊 Front-end : https://github.com/openpharma/opensource_dashboard

1. Search bar Pipeline

2. Package categorization

a. Scope

We divided our list of packages into 5 main categories : Plots, Tables, Stats, CDISC and Utilities. For the classification, I use the title and the description of the package. To clean the data, I use the library Spacy. The classification method is based on binary matching between the list of keywords for a category and the description/title of the package.

b. Performance measurement

We measure the performance using a test dataset containing 115 examples : 10 Plots, 8 Tables, 88 Stats, 2 CDISC and 15 Utilities (sum ≠ 115 bcz it's a multilabel classification). You have the accuracy on the following figure. !!! As we have a strong imbalanced dataset, accuracy is not always relevant. To have better insights, you can calculate Precision, Recall and F1-score.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Openpharma : ML for search bar and data categorization

0. General overview

Global pipeline

1. Search bar Pipeline

2. Package categorization

a. Scope

b. Performance measurement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Openpharma : ML for search bar and data categorization

0. General overview

Global pipeline

1. Search bar Pipeline

2. Package categorization

a. Scope

b. Performance measurement