Skip to content

Latest commit

 

History

History
39 lines (18 loc) · 2.12 KB

README.md

File metadata and controls

39 lines (18 loc) · 2.12 KB

Openpharma : ML for search bar and data categorization

The objective of openpharma is to provide a neutral home for open source software related to pharmaceutical industry that is not tied to one company or institution. http://openpharma.pharmaverse.org/

📨 For any questions, feel free to reach me out at the email adress : mathieu.cayssol@gmail.com

0. General overview

Global pipeline

You are in the front-end repository of openpharma. The global project include 3 repositories :

1. Search bar Pipeline

Pipeline_search_bar

2. Package categorization

a. Scope

We divided our list of packages into 5 main categories : Plots, Tables, Stats, CDISC and Utilities. For the classification, I use the title and the description of the package. To clean the data, I use the library Spacy. The classification method is based on binary matching between the list of keywords for a category and the description/title of the package.

Package categorisation scope

b. Performance measurement

We measure the performance using a test dataset containing 115 examples : 10 Plots, 8 Tables, 88 Stats, 2 CDISC and 15 Utilities (sum ≠ 115 bcz it's a multilabel classification). You have the accuracy on the following figure. !!! As we have a strong imbalanced dataset, accuracy is not always relevant. To have better insights, you can calculate Precision, Recall and F1-score.

Package categorisation - Performance