-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and drugs
Pruthiv Rajan K
Urja Biswas
Submitted by - Urja Biswas
A viral epidemic poses a threat to human life and the last resort to tackle it, on which the entire human race rely on, is the availablity of drugs. With varieities of drugs available to us, for example anti-viral drugs, palliative drugs, antibacterial drugs and so on, it has become difficult to track the role of the drug. While some drugs are proving to be effective antiviral there are drugs that only cure the symptoms. Hence, this miniproject provides the necessary tools that would provide complete information at one one place to the public.
- To create a dictionary consisting of drugs for diseases and its local name in different countries.
- Differentiate drugs that work directly against the pathogen and the drugs that only work on symptoms.
-
getpapers
to obtain research papers. To know more aboutgetpapers
click here. -
ami
for sectioning paper. To know more about sectioning viaami
click here. -
ami
/SPARQL
for the creation of dictionaries. Click to know about dictionary creation viaami
andsparql
. -
nodejs
,nvm
,cmd
,maven
,jdk
,git
are backbone of the above mentioned tools without which commands cannot be executed. -
Python
,R
and related software for used data analysis.
This comprises of top 950 articles retrieved from EuroPMC which is a platform that provides free acces to million of articles related to biomedical science. getpapers will be used to search and download these articles. The processing of the software is very quick (approx 10 minutes) which which when downloaded individually could have taken 'n' number of hours. In this project such articles will be downloaded to study drugs related to viral epidemics.
The other advantage of this project is we dont have to go through all the papers to understand the gist instead with the help of ami : section
the papers are tabulated/sectioned and one can go through all without reading them all and select the ones that is of interest.
For this purpose, the following command will be executed :
getpapers -q "antiviral drugs" -o dir_corpus950 -x -p -k 1000
The command getpapers
will initiate the process and -q
refers to the query which is to be searched. The query is entered in inverted commas as is done in "antiviral drugs"
. The next element is -o
which refers to output directory and the parameter that follows it in the name of the directory which is dir_corpus950
in our case. Then, -x -p
corresponds to xml and pdf files to be included in our search and -k 1000
limits our search to 1,000 files only. After successful completion of the command we get our Corpus-950 ready.
Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section
function of ami
.
The command executed will be :
ami -p dir_corpus950 section
Wikidata is an open-source database containing information stored semantically. Thus, applications of softwares could help us relate various properties of similar drugs and use it constructively. So, we will be creating a dictionary
with an attempt to simplify the downloaded 950 articles and relate drugs with its usage. This would require application of 'machine learning' to achieve the same.
When all the files are downloaded and succesfully stored in our dir_corpus950
directory, our next task is to generate the dictionary using ami
tools. We will use the follwing command:
ami -p dir_corpus950 search --dictionary drugs
ami
will initiate the ami function and -p
will set the path to dir_corpus950
. Then search --dictionary drugs
will search drugs from the dictionary and create an html file consisting of the data in a tabular form.
- To differentiate false positive and true positive, i.e to find papers that are related to "viral epidemic and drugs" and remove unnecessary ones.
- To find relationship between drugs, this may help in suggesting alternative drug for an epidemic.
- To segregate drugs that take action against the virus and the drugs that suppresses only the symptoms.
- To maintain a simplified drug dictionary that would be open to public.
- The goal is to find out "the drugs which are used to treat viral diseases and their local name in different countries "
- Insight many drugs has been reported that they are used to "treat only the symptoms caused by viral infection or diseases than viral infection". In an account of it studying the purpose of drugs and their drugs action.
- Creating dictionaries
- Binary classification
- Sectioning
- Finding local drugs in countries
- Drug action
- A communal corpus called
epidemic50noCov
of 50 articles for viral epidemics is created. - Expanding our search we create a new corpus consisting of 950 papers.
- Using software tools we create a dictionary for our corpus.
- A test drug dictionary was created using a list of 10 viral drugs.
- List of FDA approved drugs has been updated. (Refer Here)
-
ami
for dictionaries and sectioning. -
ami/SPARQL
for the creation of dictionaries. -
Python
,R
related software for data analysis.
To commit corpus via git
we use the following commands :
C:\Users\admin\openVirus\miniproject\drug>git status
C:\Users\admin\openVirus\miniproject\drug>dir
C:\Users\admin\openVirus\miniproject\drug>git add *
C:\Users\admin\openVirus\miniproject\drug>git status
C:\Users\admin\openVirus\miniproject\drug>git commit -am "first commit all corpus"
C:\Users\admin\openVirus\miniproject\drug>git pull
C:\Users\admin\openVirus\miniproject\drug>git push
This will initiate redirection of the page to login to your GitHub account. Successful execution of the mentioned commands will commit the files.
The pharmaceutical drugs are listed here with their INN (International Non-proprietary Name) name instead of biological/natural name.
STARTED :
- Multi Linguistics (English,हिन्दी, தமிழ்) dictionary using SPARQL.
- Binary classification using NLP.
FINISHED :
- Use the communal corpus of 50 articles on viral epidemics.
- Manual classification of communal corpus of 50 articles on viral epidemics.
- Creating corpus of 250 on antiviral drugs.
- Manual classification of corpus of 250 articles on antiviral drugs.
- ami search and section been used in corpus of 250 articles on antiviral drugs.
- Corpus and their ami search, section results were committed to git.
- Updating ami
- Created FDA approved drug dictionary by
amidict -v --dictionary drug --directory drug --input drug.txt create --informat list --outformats xml,html
. results : https://github.com/petermr/openVirus/blob/master/dictionaries/FDA%20Drug/drug.xml - Created dictionary using SPARQL wikidata query. Reference https://github.com/petermr/openVirus/wiki/Dictionary:-Drugs
- SPARQL wikiata query results : https://github.com/petermr/openVirus/blob/master/dictionaries/drug/drugs.sparql.xml
- Drugs with high occurrence in corpus are Ribavirin,Oesltamivir. Ribavirinare are mostly used check their activity, inhinitor, antiviral, HCV, herpes,hepitites. Oesltamivir used as antiviral drug, pandemic drug, replication, virus and host interaction.
NOT STARTED :
- Smoke test.
BLOCKED : Creating table from corpus for NLP to do binary classification.