Installation of pygetpapers
Clone repositories: https://github.com/ayush4921/pygetpapers
You've configured pip if you get the following output:
give a single command
python -m pip install git+git://github.com/petermr/pygetpapers
-m pip install --upgrade pip'
at command prompt
You've successfully configured if you get the following output:
Note: The warnings in the above output can be ignored
PMR> Good Have you tested it works?
- Python pre installed
Install pygetpapers by running command on commandline python -m pip install git+git://github.com/petermr/pygetpapers
PMR> This is wrong (use --help
C:\Users\vasan>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-p FROMPICKLE] [-m] [-j] [-c] [-u UPDATE]
[--api | --webscraping] [--onlyresearcharticles | --onlypreprints | --onlyreviews]
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
Add the query you want to search for. Enclose the query in quotes.
-k LIMIT, --limit LIMIT
Add the number of papers you want. Default =100
-o OUTPUT, --output OUTPUT
Add the output directory url. Default is the current working directory
-v, --onlyquery Only makes the query and stores the result.
Reads the picke and makes the xml files. Takes the path to the pickle as the input
-m, --makepdf Also makes pdf files for the papers. Works only with --api method.
-j, --makejson Also makes json files for the papers. Works only with --api method.
-c, --makecsv Also makes csv files for the papers. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit and -q or --query to be
given. Takes the path to the pickle as the input
--api Get papers using the official EuropePMC api
--webscraping Get papers using the scraping EuropePMC. Also supports getting only research papers, preprints
or review papers.
or review papers.
Get only research papers (Only works with --webscraping)
--onlypreprints Get only preprints (Only works with --webscraping)
--onlyreviews Get only review papers (Only works with --webscraping)
Clone repositories: https://github.com/ayush4921/pygetpapers by giving git clone command in cmd
You will see:
C:\Users\HP PC>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-f FROMPICKLE] [-p] [-j] [-c] [-u UPDATE]
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
escape special characters within the quotes, use backslash. The query to be quoted in either
single or double quotes.
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-v, --onlyquery Saves pickle file containing the result of the query in storage. The pickle file can be given
to --frompickle to download the papers later.
to --frompickle to download the papers later.
Reads the picke and makes the xml files. Takes the path to the pickle as the input
-p, --downloadpdf Downloads full text pdf files for the papers. Works only with --api method.
-j, --makejson Stores the per-document metadata as json. Works only with --api method.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit (If not provided, default
will be used) and -q or --query (must be provided) to be given. Takes the path to the pickle
as the input.
- Python installation on local computer.
- Open commandline and changed the working directory using
. - Cloned the repository using
git clone
command to the local computer:git clone https://github.com/petermr/pygetpapers
. - This single command will install pygetpapers
pip3 install git+git://github.com/ayush4921/pygetpapers.git
- Run
pygetpapers --help
C:\Users\DELL\Radhu>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-f FROMPICKLE] [-p] [-j] [-c] [-u UPDATE]
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
escape special characters within the quotes, use backslash. The query to be quoted in either
single or double quotes.
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-v, --onlyquery Saves pickle file containing the result of the query in storage. The pickle file can be given
to --frompickle to download the papers later.
to --frompickle to download the papers later.
Reads the picke and makes the xml files. Takes the path to the pickle as the input
-p, --downloadpdf Downloads full text pdf files for the papers. Works only with --api method.
-j, --makejson Stores the per-document metadata as json. Works only with --api method.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit (If not provided, default
will be used) and -q or --query (must be provided) to be given. Takes the path to the pickle
as the input.
Example query:
Pygetpapers -q "Medicinal Activities" -k 50 -p -o output_test
This query created "output_test" in the current directory which we define in the path.
A folder
was created within "output_test" where both the.XML
of 50 papers were downloaded. The corresponding pickle files were also created for each paper. -
file with PMC id, HTML link, pdf link, title, and the author info was created in papers
no errors