Python based implementation of NLP-Search-Engine to search result for given Natural Language Query from Covid-19 Dataset.
Install the following python3 packages to your python environment:
Spacy
(with 'en_core_web_sm')SUTime
(Instructions)CSV
SQLite3
Pickle
re
Download the Covid-19 Dataset, which contains CSV files for covid-19 statistics of various countries. Only 5 csv files are required to run this code: - worldwide-aggregate.csv - us_simplified.csv - reference.csv - time-series-19-covid-combined.csv - countries-aggregated.csv
-
In
nlp-search-engine.ipynb
second code-block contains the necessary paths and flags. Change them accordingly:- query_file_path - Path to your input query file
- create_database_flag - Set True if you want to create database from CSV files (Required Initially)
- dataset_path - Path to your covid-19 dataset CSV files folder
- database_path - File path where you want to store your generated sqlite3 database
- parsed_parameter_save_path - File path where you want to store generated parameters for given queries.
-
In
query-search.ipynb
second code-block contains the necessary paths and flags. Change them accordingly:- database_path - File path where you stored your previously generated sqlite3 database
- parsed_parameter_save_path - File path where you stored previously generated parameters for given queries.
- print_sql_queries - Set True if you want to output the executed SQL query along with the result
- Open
nlp-search-engine.ipynb
andquery-search.ipynb
in jupyter notebook. - Execute all cells of
nlp-search-engine.ipynb
to extarct parameters from queries provided in possible-questions.txt - Then execute all the cells of
query-search.ipynb
to generate the final result.
-
Input Query - Which country saw highest number of death in the month of April?
Extracted Parameters -
{
'query': 'Which country saw highest number of death in the month of April?',
'Place': {'no_match': [], 'states': [], 'countries': []},
'Time Duration': {'begin': '2020-04-01', 'end': '2020-04-31'},
'Case Type': 'death',
'Function Type': 'maximum',
'Operation Type': 'country'
}
Generated SQL -SELECT Country FROM (SELECT Country, (MAX(sum)-MIN(sum)) as cases FROM (SELECT Date, Country, SUM(Deaths) as sum FROM countries_aggregated WHERE Date BETWEEN '2020-04-01' AND '2020-04-31' GROUP BY Date, Country) GROUP BY Country) WHERE cases = (SELECT MAX(cases) from (SELECT Country, (MAX(sum)-MIN(sum)) as cases FROM (SELECT Date, Country, SUM(Deaths) as sum FROM countries_aggregated WHERE Date BETWEEN '2020-04-01' AND '2020-04-31' GROUP BY Date, Country) GROUP BY Country));
Final Answer - US -
Input Query - total number of new cases found in Greece between april to september?
Extracted Parameters -
{
'query': 'total number of new cases found in Greece between april to september?',
'Place': {'no_match': [], 'states': [], 'countries': ['greece']},
'Time Duration': {'begin': '2020-04-01', 'end': '2020-09-31'},
'Case Type': 'confirm',
'Function Type': 'sum',
'Operation Type': 'cases'
}
Generated SQL -SELECT Confirmed FROM countries_aggregated WHERE Country = 'Greece' AND Date BETWEEN '2020-04-01' AND '2020-09-31';
Final Answer - 17060