Skip to content

Using NLP techniques for answering questions about COVID-19 using Kaggle Dataset

Notifications You must be signed in to change notification settings

pavalucas/NLP_techniques_COVID-19_Kaggle_Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using NLP techniques for answering questions about COVID-19 using Kaggle Dataset

This is a summary of what was done, the full report is at NLP_COVID_Presentation.

Introduction and Scope

  • COVID-19 Open Research Dataset
    • ~13 GB
    • Over 135,000 scholarly articles
    • Including over 68,000 with full text
  • First goal: What do we know about COVID-19 symptoms?
  • Second goal: How can we cluster papers into coherent groups?

First goal

Words representation

Words clustering

  • Main idea: Cluster words represented by their vector using k-means algorithm

Words cloud

Symptoms:

images/word_cloud_symptoms.png

Organs:

images/word_cloud_organs.png

Medications:

images/word_cloud_medications.png

Second goal

  1. Create feature vector for each paper using BOW model
  2. Cluster vectors into coherent groups
  3. Visualize clusters in a 2D plot

t-SNE with no labels:

images/tsne_no_label.jpg

t-SNE with k-means labels:

images/tsne_label.jpg

Interactive plot

images/interactive_plot_1.png images/interactive_plot_1.png images/interactive_plot_1.png

About

Using NLP techniques for answering questions about COVID-19 using Kaggle Dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages