Skip to content

Semantic analysis of the Literature on essential oils from different part of plants with reported medicinal activity

Vasant Kumar edited this page Jul 1, 2021 · 6 revisions

Strategy for thesis writing and result analysis

  • Which part of the plant is associated with the essential oil?
  • What are the different plants and their parts associated with maximum amount of essential oil?
  • To study the different parts of plants and their medicinal activity on the basis of essential oil.
  • To study the compounds present in different part of plants.

Introduction

  • Need to be done

Literature review

  • Plants have long been used as a source of nutrients as well as traditional treatments for health issues, as evidenced by Hindu, Arab, Chinese, Egyptian, Roman and Greek literature (Hubert et al. 2017). Plants consists Essential Oils (EOs) which majorly exhibits medicinal properties (Sundaresan Bhavaniramya et al). EOs are concentrated volatile chemicals found in specific cells or parts of plants that are utilized to protect them from predators and pests while also attracting pollinators. Essential oils have been utilized for medical and wellness purposes in many civilizations for thousands of years. They have recently acquired attention as a safe, natural and economical treatment for a range of health conditions because of their antidepressant, purifying, detoxifying, antiviral and antibacterial characteristics (Carmen ambrosio et al). These aromatic oily liquids majorly extracted from different parts of plants such as barks, seeds, peels, flowers, leaves, cavities, secretary cells, and channels . In addition to this they are also largely found in oil sacs or oil glands in the different layers of the cuticles, peel of the fruit and mainly the flavedo part (Neelima mahato et al). The Biosynthseis of aromatic substances occurs in the leaves, which are where the majority of them are located and remain until flowering. Essential oils migrate into flowers during flowering, and some of them is absorbed during the fertilization process. It accumulates in fruits and seeds after fertilization, or it migrates to leaves, bark, and roots (Bakari et al).

  • Still working on it

Tools

getpapers

  • getpapers is a tool written by Rik Smith-Unna, used for downloading research papers by using single line command. It can be used for searching metadata, fulltexts (in a PDF or XML form), and supplementary materials through majorly from EuropePMC along with ArXiv and Crossref. Getpapers is a convenient tool for obtaining huge numbers of papers quickly for reading or bibliometric research. -The following link gives a step-by-step tutorial for installing getpapers on your local system: https://github.com/petermr/tigr2ess/blob/master/installation/windows/INSTALLATION.md
  • Open a command line and enter getpapers, to validate the installation.
C:\Users\vasan>getpapers

  Usage: getpapers [options]

  Options:

    -h, --help                output usage information
    -V, --version             output the version number
    -q, --query <query>       search query (required)
    -o, --outdir <path>       output directory (required - will be created if not found)
    --api <name>              API to search [eupmc, crossref, ieee, arxiv] (default: eupmc)
    -x, --xml                 download fulltext XMLs if available
    -p, --pdf                 download fulltext PDFs if available
    -s, --supp                download supplementary files if available
    -t, --minedterms          download text-mined terms if available
    -l, --loglevel <level>    amount of information to log (silent, verbose, info*, data, warn, error, or debug)
    -a, --all                 search all papers, not just open access
    -n, --noexecute           report how many results match the query, but don't actually download anything
    -f, --logfile <filename>  save log to specified file in output directory as well as printing to terminal
    -k, --limit <int>         limit the number of hits and downloads
    --filter <filter object>  filter by key value pair, passed straight to the crossref api only
    -r, --restart             restart file downloads after failure

getpapers -q <" Required title"> -o <output directory> -x<xml> -p<pdf> -k <amount/quantity of papers required>

pygetpapers

  • Pygetpapers is a python version of getpapers. It makes requests to open access scientific text repositories, analyses the hits, and downloads the articles without further interaction. Ayush Garg has created this tool that makes possible to search for scientific articles. The tool is available on GitHub.
  • The following link gives a step-by-step tutorial for installing getpapers on your local system: https://github.com/petermr/pygetpapers/blob/main/README.md#6-installation
  • To verify that the installation was successful, run the command pygetpapers from the command line.
C:\Users\vasan>pygetpapers
usage: pygetpapers [-h] [-v] [-q QUERY] [-o OUTPUT] [-x] [-p] [-s] [--references REFERENCES] [-n]
                   [--citations CITATIONS] [-l LOGLEVEL] [-f LOGFILE] [-k LIMIT] [-r RESTART] [-u UPDATE]
                   [--onlyquery] [-c] [--synonym]

Welcome to Pygetpapers version 0.0.3.1. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         output the version number
  -q QUERY, --query QUERY
                        query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
                        escape special characters within the quotes, use backslash. The query to be quoted in either
                        single or double quotes.
  -o OUTPUT, --output OUTPUT
                        output directory (Default: current working directory)
  -x, --xml             download fulltext XMLs if available
  -p, --pdf             download fulltext PDFs if available
  -s, --supp            download supplementary files if available
  --references REFERENCES
                        Download references if available. Requires source for references
                        (AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
  -n, --noexecute       report how many results match the query, but don't actually download anything
  --citations CITATIONS
                        Download citations if available. Requires source for citations
                        (AGR,CBA,CTX,ETH,HIR,MED,PAT,PMC,PPR).
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Provide logging level. Example --log warning <<info,warning,debug,error,critical>>,
                        default='info'
  -f LOGFILE, --logfile LOGFILE
                        save log to specified file in output directory as well as printing to terminal
  -k LIMIT, --limit LIMIT
                        maximum number of hits (default: 100)
  -r RESTART, --restart RESTART
                        Reads the json and makes the xml files. Takes the path to the json as the input
  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Takes the path of metadata json file of the
                        orignal corpus as the input. Requires -k or --limit (If not provided, default will be used)
                        and -q or --query (must be provided) to be given. Takes the path to the json as the input.
  --onlyquery           Saves json file containing the result of the query in storage. The json file can be given to
                        --restart to download the papers later.
  -c, --makecsv         Stores the per-document metadata as csv. Works only with --api method.
  --synonym             Results contain synonyms as well.

pygetpapers -q <"project title"> -o <output directory> -x<xml> -p<pdf> -k <number of papers required> -c <csv metadata file>

ami

  • ami is a simple tool written in Java used for managing research papers according to our use. It can be used to download, annotate, analyze texts, tables and diagrams along with creating dictionaries.

  • Papers can be divided into body,front,back and groups using ami section. Sectioning downloaded files provides a tree structure for us, allowing us to navigate the file's information more easily.ami -p <cproject> section

  • ami search analyses and searches the keywords in the project repository, producing the term's frequency data table and the corpus's histogram. ami –p <cproject><directory> search –dictionary <path>

-In order to convert the SPARQL end point results into the dictionary format, the amidict is used.

amidict -vv –dictionary <name of dictionary> --directory <Path of the directory folder> --input <SPARQL endpoint output name> create --informat wikisparqlxml --sparqlmap wikidataURL = item, name = itemLabel, term = itemLabel --transformName wikidataID=EXTRACT(wikidataURL,./(.))

pyami/ami_gui.py

  • pyami is a search engine for reading and analysing documents that is analogous to the ami interface. pyami visually shows the search frequency values. Various queries can be made by clicking on different options of dictionaries, minicorpus and sections. pyami can be used for analysing as well as downloading papers using pygetpapers.

ami gui diagram

Wikidata

  • Wikidata is an open and free repository of information that can be edited by both humans and machines. Wikidata includes a number of characteristics that are essential for a scientific knowledgebase, including the way to obtain references and qualifiers for individual claims, a simple live updating method, and the ability to audit every modification via public registers. Wikidata collaborates with a number of other organisations in a range of fields. Wikidata's design is built on widely used web standards for expressing its content, such as XML, JSON, and RDF.

New Microsoft PowerPoint Presentation

git clone

Methodology

Creation of dictionary plant_part

Dictionary

  • Dictionaries are the collections of terms, complemented with information such as descriptions, origin and, above all, links to other terminological resources, especially the Wikidata. It contains following terms:
  • ID: identifier that has been serialised.
  • Name: a human-readable string that describes the idea.
  • Term: a specific string that identifies a topic.
  • Wikidata: A unique identification for each normalised dictionary term that links to Wikidata.org, a free and open knowledge source that both people and machines may access and update.
  • Description: A brief description of the plant part that has been identified in that row.

Steps for creation of dictionary:

  • Collected wikidata IDs from the existing dictionary created by Emanuel Faria containing 285 entries.
  • Previous dictionary link : https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantPart/eoPlantPart.xml
  • SPARQL query can be created by clicking on the "Query Service" in the link : https://www.wikidata.org/wiki/Wikidata:Main_Page and click on 'Query . This will take you to the Wikidata Query Service page, where you may construct a SPARQL query for the activity dictionary.
  • The wikidata ID was utilised as a value in the SPARQL query that was used to create the dictionary activity. This query includes the languages English, Hindi, Tamil, Spanish, French, German, Chinese, and Urdu, as well as the item, label, itemLabel, and itemAltLabel.
## Shttps://query.wikidata.org/electing the prefered label
## Selecting the prefered label
SELECT * WHERE {
  VALUES ?item {
    wd:Q10289985 wd:Q103129 wd:Q10437539 wd:Q107216 wd:Q107412 wd:Q1113448 wd:Q11162356 wd:Q1120914 wd:Q1125215 wd:Q1138632 wd:Q1155708 wd:Q11895190 wd:Q1192354 wd:Q12057965 wd:Q122811
    wd:Q1271979 wd:Q1277215 wd:Q134267 wd:Q1347099 wd:Q1351263 wd:Q1364 wd:Q1421859 wd:Q1425870 wd:Q1427245 wd:Q145205 wd:Q14524280wd:Q1474699 wd:Q148436 wd:Q14849087
    wd:Q148515 wd:Q148600 wd:Q1493115 wd:Q149316 wd:Q1546595 wd:Q158583 wd:Q158967 wd:Q16128920 wd:Q16535076 wd:Q171187 wd:Q1713537 wd:Q18088308 wd:Q18250160
    wd:Q183319 wd:Q1840192 wd:Q184208 wd:Q184453 wd:Q185138 wd:Q188748 wd:Q1889013 wd:Q191546 wd:Q191556 wd:Q192576 wd:Q193472wd:Q1995772 wd:Q2004067 wd:Q201851
    wd:Q207123 wd:Q207495 wd:Q216635 wd:Q217753 wd:Q220869 wd:Q224107 wd:Q22710 wd:Q2322325 wd:Q2331384 wd:Q2365301 wd:Q241368 wd:Q259028 wd:Q2746099 wd:Q27505399
    wd:Q27506529 wd:Q279513 wd:Q287 wd:Q2923673 wd:Q2933965 wd:Q304216 wd:Q30513971 wd:Q30765614 wd:Q3087886 wd:Q3089146 wd:Q3129307 wd:Q3312287 wd:Q33971 wd:Q3791538
    wd:Q380138 wd:Q3894544 wd:Q40763 wd:Q41500 wd:Q46723512 wd:Q489628 wd:Q497512 wd:Q504930 wd:Q506 wd:Q512249 wd:Q572097 wd:Q577430 wd:Q59243260 wd:Q60777361
    wd:Q609336 wd:Q62779 wd:Q643352 wd:Q65089222 wd:Q655824 wd:Q661390 wd:Q66571835 wd:Q687699 wd:Q70062083 wd:Q7079661 wd:Q7201653 wd:Q729496 wd:Q756954 wd:Q789802
    wd:Q794374 wd:Q796482 wd:Q79932 wd:Q87484743 wd:Q87485317 wd:Q87485325 wd:Q87485505 wd:Q87485933 wd:Q87485935 wd:Q87486641 wd:Q87486726 wd:Q87486867 wd:Q87486986
    wd:Q87487349 wd:Q87487358 wd:Q87487640 wd:Q87487715 wd:Q87488025 wd:Q87498057 wd:Q87499047 wd:Q87499514 wd:Q87500059 wd:Q87500134 wd:Q87501280 wd:Q87501358 wd:Q87502457
    wd:Q87502461 wd:Q87503474 wd:Q87589402 wd:Q87590992 wd:Q87591010 wd:Q87591031 wd:Q87591194 wd:Q87591759 wd:Q87592050 wd:Q87592301 wd:Q87592700 wd:Q87592986 wd:Q87592994
    wd:Q87593008 wd:Q87594836 wd:Q87606844 wd:Q87608261 wd:Q87608372 wd:Q87608694 wd:Q87608919 wd:Q87609332 wd:Q87609728 wd:Q87609996 wd:Q87610596 wd:Q87610912 wd:Q87612280
    wd:Q87612659 wd:Q87612806 wd:Q87612861 wd:Q87612938 wd:Q87613121 wd:Q87613201 wd:Q87614021 wd:Q87622342 wd:Q87622668 wd:Q87622862 wd:Q87623367 wd:Q876445 wd:Q87648435
    wd:Q87648445 wd:Q87648476 wd:Q87648478 wd:Q87648498 wd:Q87648517 wd:Q87648548 wd:Q87648554 wd:Q87648558 wd:Q87648580 wd:Q87648628 wd:Q87648640 wd:Q87648664 wd:Q87648669
    wd:Q87648681 wd:Q87648700 wd:Q87648799 wd:Q87648863 wd:Q87649544 wd:Q882214 wd:Q88224 wd:Q887231 wd:Q913294 wd:Q927202 wd:Q987774
}
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?item rdfs:label ?itemLabel;
      skos:altLabel ?itemAltLabel;
      schema:description ?itemDescription.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "hi".
    ?item skos:altLabel ?hindialtlabel;
      rdfs:label ?hindiLabel;
      schema:description ?hindi.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "ta".
    ?item skos:altLabel ?tamilaltlabel;
      rdfs:label ?tamilLabel;
      schema:description ?tamil.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "es".
    ?item skos:altLabel ?esaltlabel;
      rdfs:label ?esLabel;
      schema:description ?es.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "fr".
    ?item skos:altLabel ?fraltlabel;
      rdfs:label ?frLabel;
      schema:description ?fr.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "de".
    ?item skos:altLabel ?dealtlabel;
      rdfs:label ?deLabel;
      schema:description ?de.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "zh".
    ?item skos:altLabel ?zhaltlabel;
      rdfs:label ?zhLabel;
      schema:description ?zh.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "ur".
    ?item skos:altLabel ?uraltlabel;
      rdfs:label ?urLabel;
      schema:description ?ur.
  }
  OPTIONAL { ?wikipedia schema:about ?item; schema:isPartOf <https://en.wikipedia.org/> }
  OPTIONAL { ?hiwikipedia schema:about ?item; schema:isPartOf <https://hi.wikipedia.org/> }
  OPTIONAL { ?tawikipedia schema:about ?item; schema:isPartOf <https://ta.wikipedia.org/> }
  OPTIONAL { ?eswikipedia schema:about ?item; schema:isPartOf <https://es.wikipedia.org/> }
  OPTIONAL { ?frwikipedia schema:about ?item; schema:isPartOf <https://fr.wikipedia.org/> }
  OPTIONAL { ?dewikipedia schema:about ?item; schema:isPartOf <https://de.wikipedia.org/> }
  OPTIONAL { ?zhwikipedia schema:about ?item; schema:isPartOf <https://zh.wikipedia.org/> }
  OPTIONAL { ?urwikipedia schema:about ?item; schema:isPartOf <https://ur.wikipedia.org/> }
}
  • The above query returns a description, a WikiData ID, synonyms, terms, and the Wikipedia URL, as well as descriptions and synonyms in several languages.
  • After obtaining the result, a SPARQL endpoint was retrieved and the SPARQL file was downloaded.
  • The amidict command was used at the command prompt to convert the SPARQL endpoint output to the dictionary's standard xml format.
amidict -vv --dictionary plant_part --directory plant_part --input sparql create --informat wikisparqlxml --sparqlmap wikidataURL=item,wikipediaPage=wikipedia,name=itemLabel,term=itemLabel,Description=itemDescription,Hindi=hindiLabel,Hindi_description=hindi,Hindi_altLabel=hindialtLabel,Tamil=tamilLabel,Tamil_description=tamil,Tamil_altLabel=tamilaltLabel,Spanish=esLabel,Spanish_description=es,Spanish_altLabel=esaltLabel,French=frLabel,French_description=fr,French_altLabel=fraltLabel,Germam=deLabel,German_description=de,German_altLabel=dealtLabel,Chinese=zhLabel,Chinese_altLabel=zhaltLabel,Chinese_description=zh,Urdu=urLabel,Urdu_altLabel=uraltLabel,Urdu_description=ur --transformName wikidataID=EXTRACT(wikidataURL,./(.)) --synonyms=itemAltLabel

SS dictionary

Creation of minicorpus

  • The programme "getpapers" was used to create a mini-corpus of open scientific literature on "plant parts and essential oil" from EuroPMC getpapers -q "(plant parts) AND (essential oil)" -o plantpart -x -p -k 100 . Note : Pygetpapers can also be used to construct a corpus

The query is denoted by -q.

  • -o refers to the output directory, which in my case is a plantpart.
  • -x -p denotes the addition of xml and pdf files in the search.
  • -k 100 restricts our search to to 100 files.
  • A corpus of 100 papers was created once the command was successfully completed running.

minicorpus images

Clone this wiki locally