-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and non pharmaceutical interventions
Zeyang Charles Li
NPIs are actions, apart from getting vaccinated and taking medicine, that people and communities can take to help slow the spread of illnesses like pandemic influenza (flu). NPIs are also known as community mitigation strategies [1]. Common NPIs include face masks, social distancing, quarantine etc.
This miniproject is set to find, in literature, whether the reported NPIs have an effect on controlling viral epidemics.
-
Conduct manual binary classifications on communal corpus
Epidemic50noCov
and create a spreadsheetSTARTED
-
Create dictionary specific to this miniproject, starting from Wikidata/Wikipedia and build dictionary with
amidict
STARTED
https://en.wikipedia.org/wiki/Non-pharmaceutical_interventions
-
Re-run the query with project-specific dictionary and retrieve a new corpus of 950 papers with
amisearch
getpapers
-
Section papers to extract paragraphs mostly related to NPIs
NOT STARTED
-
amidict
will be used for creating dictionariesami
installation failed due to objective errors.SPARQL
is then opted.Current step: merging multiple SPARQL queries
-
getpapers
for retrieving papers in new corporagetpapers
paper retrieving failed due to proxy server.getpapers
runs after proxy setting change, a reduced corpus (k=580) was created.Current step: Attempts of reducing corpus size/ work locally.
-
KNIME
for data flow -
R
for data analyses and visualisation
getpapers
runs after proxy change and VPN reset. An initial corpus (k=580) was created.
Multiple SPARQL
queries results obtained. Attempted merging all using UNION
feature. Then attempted removing the redundant terms by DISTINCT
BLOCK: errors during merging Instances of
and Main Subject
queries.
Successfully cloned ami-jar
repository (finally)
For cloning a big repo with poor internet connection (low download speed)
-
First increase the postBuffer value
git config --global https.postBuffer 157286400
-
Then turn off repository compression
git config --global core.compression 0
-
Then partially download a chunk of the repo using
depth 1
. This helps reducing the connection time with remote host and reduces the risk of fatal clone failures.git clone --depth 1 https://github.com/petermr/ami-jars.git
-
Once the first part is cloned, finish download.
git fetch --unshallow
-
Once
unshallow
task finishes, retype git fetch --unshallow and you should seefatal: --unshallow on a complete repository does not make sense
-
Change PATH for
ami
Binary classification has been done on k=580 corpus but too many false positives (19/20). This could be caused by not refining the search terms in getpaper query.
getpapers -q "Viral Epidemics and Non-pharmaceutical Interventions" -k 950 -x -o NPIcorpus2
BLOCK:
-
Re-ran
SPARQL
and changed all queries toInstances of
but many terms were lost -- terms included in 'main subject' are not present in 'instances of' -
Significant noise in wikidata
main subjects
terms and discrepancies between wikidata / wikipedia
Created a new corpus (CorpusNPI2) of 760 articles and started binary classification on both viral epidemics and NPI
Attempted altering PATH for ami but my PATH looks a bit tangled
echo $PATH
/Users/charlesli/.nvm/versions/node/v7.10.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/charlesli/Desktop/apache-maven-3.6.3/bin
Curated a NPI dictionary without SPARQL
Successfully installed ami3-2020.08.09_09.54.10
and changed PATH by editing .sh profile
Created a third corpus using query terms "viral epidemics" and "non-pharmaceutical" to minimise noise brought by "interventions", k=464
Downloaded Docker
and Jupyter Notebook
Attempted creating a xml dictionary with amidict
containing terms from curated dictionary.
Run smoke tests on Docker
and Jupyter Notebook
Classified 40 papers from CorpusNPI3 and results were better (24 pos /40) and attempted to commit to github
Created dictionary using amidict
and committed to github
https://github.com/petermr/openVirus/blob/master/dictionaries/NPIdict1.xml
BLOCK:
-
multiple words (entries) in ami dictionary (solved)
-
synonyms for terms
Installed Anaconda Navigator
and ran Jupyter Notebook
Created a .csv file containing dictionary terms
Created new dictionary with phrases for terms NPIdict2
Attempted validation on NPIdict2
and showed dictionary as NULL
but with 38 entries, using this syntax
amidict --dictionary NPIdict2 --directory /Users/charlesli/Desktop/NPIdict2 display --validate
Could be the display command??
BLOCK:
- Validation
Attempted debugging validation
Run ami-section
on corpus (individually)
Deleted all empty repositories
Sectioned all 437 papers in NPIcorpus2 using ami-section
Run ami-search
on NPIcorpus2
BLOCK: Errors during ami-search: "cannot read stopward stream"
and " SXXP0003 Error reported by XML parser: Content is not allowed in prolog. java.lang.RuntimeException: cannot transform NPIcorpus2/PMC5959063/fulltext.xml"
No updates due to illness.
run ami-search
on 437 papers but the results only showed word counts for every word (not specific to my dictionary)
Change PATH for ami without altering existing PATH of MAVEN
and git
DONE
Rerun getpapers
and download a second (third by 12/08) corpus, then manually classify STARTED
Resolve SPARQL merge issues and query noises DONE
Add properties (wikidata ID etc.) and synonyms to curated dictionary NOT STARTED
Commit dictionary to github DONE
Covert .csv file to xml DONE
Validate new dictionary STARTED
Commit corpus
[1] https://www.cdc.gov/nonpharmaceutical-interventions/index.html