-
Notifications
You must be signed in to change notification settings - Fork 9
Installation of pygetpapers
- Clone repositories: https://github.com/ayush4921/pygetpapers
You've configured pip if you get the following output:
Usage:
pip <command> [options]
Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
cache Inspect and manage pip's wheel cache.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
debug Show information useful for debugging.
help Show help for commands.
General Options:
-h, --help Show help.
--isolated Run pip in an isolated mode, ignoring environment variables and user configuration.
-v, --verbose Give more output. Option is additive, and can be used up to 3 times.
-V, --version Show version and exit.
-q, --quiet Give less output. Option is additive, and can be used up to 3 times (corresponding to
WARNING, ERROR, and CRITICAL logging levels).
--log <path> Path to a verbose appending log.
--no-input Disable prompting for input.
--proxy <proxy> Specify a proxy in the form [user:passwd@]proxy.server:port.
--retries <retries> Maximum number of retries each connection should attempt (default 5 times).
--timeout <sec> Set the socket timeout (default 15 seconds).
--exists-action <action> Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup,
(a)bort.
--trusted-host <hostname> Mark this host or host:port pair as trusted, even though it does not have valid or any
HTTPS.
--cert <path> Path to alternate CA bundle.
--client-cert <path> Path to SSL client certificate, a single file containing the private key and the
certificate in PEM format.
--cache-dir <dir> Store the cache data in <dir>.
--no-cache-dir Disable the cache.
--disable-pip-version-check
Don't periodically check PyPI to determine whether a new version of pip is available for
download. Implied with --no-index.
--no-color Suppress colored output
--no-python-version-warning
Silence deprecation warnings for upcoming unsupported Pythons.
--use-feature <feature> Enable new functionality, that may be backward incompatible.
--use-deprecated <feature> Enable deprecated functionality, that will be removed in the future.
give a single command
python -m pip install git+git://github.com/petermr/pygetpapers
or
-m pip install --upgrade pip'
at command prompt
You've successfully configured if you get the following output:
Defaulting to user installation because normal site-packages is not writeable
Collecting git+git://github.com/petermr/pygetpapers
Cloning git://github.com/petermr/pygetpapers to c:\users\talha\appdata\local\temp\pip-req-build-436dy2hn
Collecting requests
Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
|████████████████████████████████| 61 kB 163 kB/s
Collecting pandas_read_xml
Downloading pandas_read_xml-0.0.9-py3-none-any.whl (6.2 kB)
Collecting pandas
Downloading pandas-1.2.3-cp39-cp39-win_amd64.whl (9.3 MB)
|████████████████████████████████| 9.3 MB 469 kB/s
Collecting lxml
Downloading lxml-4.6.2-cp39-cp39-win_amd64.whl (3.5 MB)
|████████████████████████████████| 3.5 MB 187 kB/s
Collecting chromedriver_autoinstaller
Downloading chromedriver_autoinstaller-0.2.2-py3-none-any.whl (5.9 kB)
Collecting xmltodict
Downloading xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Collecting selenium
Downloading selenium-3.141.0-py2.py3-none-any.whl (904 kB)
|████████████████████████████████| 904 kB 123 kB/s
Collecting certifi>=2017.4.17
Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
|████████████████████████████████| 147 kB 204 kB/s
Collecting idna<3,>=2.5
Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
|████████████████████████████████| 58 kB 273 kB/s
Collecting urllib3<1.27,>=1.21.1
Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
|████████████████████████████████| 153 kB 312 kB/s
Collecting chardet<5,>=3.0.2
Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
|████████████████████████████████| 178 kB 297 kB/s
Collecting pyarrow
Downloading pyarrow-3.0.0-cp39-cp39-win_amd64.whl (12.6 MB)
|████████████████████████████████| 12.6 MB 285 kB/s
Collecting zipfile36
Downloading zipfile36-0.1.3-py3-none-any.whl (20 kB)
Collecting distlib
Downloading distlib-0.3.1-py2.py3-none-any.whl (335 kB)
|████████████████████████████████| 335 kB 80 kB/s
Collecting pytz>=2017.3
Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
|████████████████████████████████| 510 kB 344 kB/s
Collecting python-dateutil>=2.7.3
Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
|████████████████████████████████| 227 kB 364 kB/s
Collecting numpy>=1.16.5
Downloading numpy-1.20.1-cp39-cp39-win_amd64.whl (13.7 MB)
|████████████████████████████████| 13.7 MB 344 kB/s
Collecting six>=1.5
Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Using legacy 'setup.py install' for pygetpapers, since package 'wheel' is not installed.
Installing collected packages: certifi, idna, urllib3, chardet, requests, numpy, pyarrow, xmltodict, zipfile36, pytz, six, python-dateutil, pandas, distlib, pandas-read-xml, lxml, chromedriver-autoinstaller, selenium, pygetpapers
WARNING: The script chardetect.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script f2py.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script plasma_store.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script chromedriver-path.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Running setup.py install for pygetpapers ... done
Successfully installed certifi-2020.12.5 chardet-4.0.0 chromedriver-autoinstaller-0.2.2 distlib-0.3.1 idna-2.10 lxml-4.6.2 numpy-1.20.1 pandas-1.2.3 pandas-read-xml-0.0.9 pyarrow-3.0.0 pygetpapers-0.0.1 python-dateutil-2.8.1 pytz-2021.1 requests-2.25.1 selenium-3.141.0 six-1.15.0 urllib3-1.26.4 xmltodict-0.12.0 zipfile36-0.1.3
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'C:\Program Files\Python39\python.exe -m pip install --upgrade pip' command.
Note: The warnings in the above output can be ignored
PMR> Good Have you tested it works?
- Python pre installed
Install pygetpapers by running command on commandline python -m pip install git+git://github.com/petermr/pygetpapers
C:\Users\vasan>pip3 install git+git://github.com/ayush4921/pygetpapers.git
Collecting git+git://github.com/ayush4921/pygetpapers.git
Cloning git://github.com/ayush4921/pygetpapers.git to c:\users\vasan\appdata\local\temp\pip-req-build-4lhdpcvz
Collecting requests
Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
|████████████████████████████████| 61 kB 170 kB/s
Requirement already satisfied: pandas_read_xml in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (0.2.2)
Collecting xmltodict
Using cached xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Requirement already satisfied: selenium in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (3.12.0)
Collecting chardet<5,>=3.0.2
Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
|████████████████████████████████| 178 kB 285 kB/s
Collecting urllib3<1.27,>=1.21.1
Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
|████████████████████████████████| 153 kB 819 kB/s
Collecting certifi>=2017.4.17
Using cached certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
Collecting idna<3,>=2.5
Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
|████████████████████████████████| 58 kB 896 kB/s
Collecting zipfile36
Using cached zipfile36-0.1.3-py3-none-any.whl (20 kB)
Collecting distlib
Using cached distlib-0.3.1-py2.py3-none-any.whl (335 kB)
Requirement already satisfied: pyarrow in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas_read_xml->pipreqs==0.0.1) (3.0.0)
Collecting numpy>=1.16.5
Using cached numpy-1.20.1-cp39-cp39-win_amd64.whl (13.7 MB)
Requirement already satisfied: pytz>=2017.3 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas->pipreqs==0.0.1) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas->pipreqs==0.0.1) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from python-dateutil>=2.7.3->pandas->pipreqs==0.0.1) (1.15.0)
Using legacy 'setup.py install' for pipreqs, since package 'wheel' is not installed.
Installing collected packages: chardet, urllib3, certifi, idna, requests, xmltodict, pipreqs, zipfile36, distlib, numpy
Running setup.py install for pipreqs ... done
Successfully installed certifi-2020.12.5 chardet-4.0.0 distlib-0.3.1 idna-2.10 numpy-1.20.1 pipreqs-0.0.1 requests-2.25.1 urllib3-1.26.4 xmltodict-0.12.0 zipfile36-0.1.3
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'c:\users\vasan\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.
PMR> This is wrong (use --help
)
C:\Users\vasan>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-p FROMPICKLE] [-m] [-j] [-c] [-u UPDATE]
[--api | --webscraping] [--onlyresearcharticles | --onlypreprints | --onlyreviews]
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
Add the query you want to search for. Enclose the query in quotes.
-k LIMIT, --limit LIMIT
Add the number of papers you want. Default =100
-o OUTPUT, --output OUTPUT
Add the output directory url. Default is the current working directory
-v, --onlyquery Only makes the query and stores the result.
-p FROMPICKLE, --frompickle FROMPICKLE
Reads the picke and makes the xml files. Takes the path to the pickle as the input
-m, --makepdf Also makes pdf files for the papers. Works only with --api method.
-j, --makejson Also makes json files for the papers. Works only with --api method.
-c, --makecsv Also makes csv files for the papers. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit and -q or --query to be
given. Takes the path to the pickle as the input
--api Get papers using the official EuropePMC api
--webscraping Get papers using the scraping EuropePMC. Also supports getting only research papers, preprints
or review papers.
--onlyresearcharticles
Get only research papers (Only works with --webscraping)
--onlypreprints Get only preprints (Only works with --webscraping)
--onlyreviews Get only review papers (Only works with --webscraping)
Clone repositories: https://github.com/ayush4921/pygetpapers by giving git clone command in cmd
You will see:
Usage:
pip <command> [options]
Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
cache Inspect and manage pip's wheel cache.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
debug Show information useful for debugging.
help Show help for commands.
General Options:
-h, --help Show help.
--isolated Run pip in an isolated mode, ignoring environment variables and user configuration.
-v, --verbose Give more output. Option is additive, and can be used up to 3 times.
-V, --version Show version and exit.
-q, --quiet Give less output. Option is additive, and can be used up to 3 times (corresponding to
WARNING, ERROR, and CRITICAL logging levels).
--log <path> Path to a verbose appending log.
--no-input Disable prompting for input.
--proxy <proxy> Specify a proxy in the form [user:passwd@]proxy.server:port.
--retries <retries> Maximum number of retries each connection should attempt (default 5 times).
--timeout <sec> Set the socket timeout (default 15 seconds).
--exists-action <action> Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup,
(a)bort.
--trusted-host <hostname> Mark this host or host:port pair as trusted, even though it does not have valid or any
HTTPS.
--cert <path> Path to alternate CA bundle.
--client-cert <path> Path to SSL client certificate, a single file containing the private key and the
certificate in PEM format.
--cache-dir <dir> Store the cache data in <dir>.
--no-cache-dir Disable the cache.
--disable-pip-version-check
Don't periodically check PyPI to determine whether a new version of pip is available for
download. Implied with --no-index.
--no-color Suppress colored output
--no-python-version-warning
Silence deprecation warnings for upcoming unsupported Pythons.
--use-feature <feature> Enable new functionality, that may be backward incompatible.
--use-deprecated <feature> Enable deprecated functionality, that will be removed in the future.
C:\Users\HP PC>python -m pip install git+git://github.com/petermr/pygetpapers
Collecting git+git://github.com/petermr/pygetpapers
Cloning git://github.com/petermr/pygetpapers to c:\users\hp pc\appdata\local\temp\pip-req-build-6zl7oaza
Requirement already satisfied: requests in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (2.20.0)
Requirement already satisfied: pandas_read_xml in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (3.12.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (2020.12.5)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (1.24.3)
Requirement already satisfied: idna<2.8,>=2.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (2.7)
Requirement already satisfied: zipfile36 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.1) (0.1.3)
Requirement already satisfied: pyarrow in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.1) (3.0.0)
Requirement already satisfied: distlib in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.1) (0.3.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.1) (1.20.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.1) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.1) (2021.1)
Requirement already satisfied: six>=1.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7.3->pandas->pygetpapers==0.0.1) (1.15.0)
Using legacy 'setup.py install' for pygetpapers, since package 'wheel' is not installed.
Installing collected packages: pygetpapers
Running setup.py install for pygetpapers ... done
Successfully installed pygetpapers-0.0.1
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'C:\Users\HP PC\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.
C:\Users\HP PC>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-f FROMPICKLE] [-p] [-j] [-c] [-u UPDATE]
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
escape special characters within the quotes, use backslash. The query to be quoted in either
single or double quotes.
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-v, --onlyquery Saves pickle file containing the result of the query in storage. The pickle file can be given
to --frompickle to download the papers later.
-f FROMPICKLE, --frompickle FROMPICKLE
Reads the picke and makes the xml files. Takes the path to the pickle as the input
-p, --downloadpdf Downloads full text pdf files for the papers. Works only with --api method.
-j, --makejson Stores the per-document metadata as json. Works only with --api method.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit (If not provided, default
will be used) and -q or --query (must be provided) to be given. Takes the path to the pickle
as the input.
- Python installation on local computer.
- Open commandline and changed the working directory using
cd
. - Cloned the repository using
git clone
command to the local computer:git clone https://github.com/petermr/pygetpapers
. - This single command will install pygetpapers
pip3 install git+git://github.com/ayush4921/pygetpapers.git
.
C:\Users\DELL\Radhu>pip3 install git+git://github.com/ayush4921/pygetpapers.git
Collecting git+git://github.com/ayush4921/pygetpapers.git
Cloning git://github.com/ayush4921/pygetpapers.git to c:\users\dell\appdata\local\temp\pip-req-build-3c6tx2zj
Running command git clone -q git://github.com/ayush4921/pygetpapers.git 'C:\Users\DELL\AppData\Local\Temp\pip-req-build-3c6tx2zj'
Requirement already satisfied: requests in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (2.20.0)
Requirement already satisfied: pandas_read_xml in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (3.12.0)
Requirement already satisfied: numpy>=1.16.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pipreqs==0.0.1) (1.20.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pipreqs==0.0.1) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pipreqs==0.0.1) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7.3->pandas->pipreqs==0.0.1) (1.15.0)
Requirement already satisfied: distlib in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pipreqs==0.0.1) (0.3.1)
Requirement already satisfied: zipfile36 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pipreqs==0.0.1) (0.1.3)
Requirement already satisfied: pyarrow in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pipreqs==0.0.1) (3.0.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (2020.12.5)
Requirement already satisfied: idna<2.8,>=2.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (2.7)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (1.24.3)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (3.0.4)
- Run
pygetpapers --help
C:\Users\DELL\Radhu>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-f FROMPICKLE] [-p] [-j] [-c] [-u UPDATE]
Welcome to Pygetpapers. -h or --help for help
optional arguments:
-h, --help show this help message and exit
-q QUERY, --query QUERY
query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
escape special characters within the quotes, use backslash. The query to be quoted in either
single or double quotes.
-k LIMIT, --limit LIMIT
maximum number of hits (default: 100)
-o OUTPUT, --output OUTPUT
output directory (Default: current working directory)
-v, --onlyquery Saves pickle file containing the result of the query in storage. The pickle file can be given
to --frompickle to download the papers later.
-f FROMPICKLE, --frompickle FROMPICKLE
Reads the picke and makes the xml files. Takes the path to the pickle as the input
-p, --downloadpdf Downloads full text pdf files for the papers. Works only with --api method.
-j, --makejson Stores the per-document metadata as json. Works only with --api method.
-c, --makecsv Stores the per-document metadata as csv. Works only with --api method.
-u UPDATE, --update UPDATE
Updates the corpus by downloading new papers. Requires -k or --limit (If not provided, default
will be used) and -q or --query (must be provided) to be given. Takes the path to the pickle
as the input.
-
Example query:
Pygetpapers -q "Medicinal Activities" -k 50 -p -o output_test
-
This query created "output_test" in the current directory which we define in the path.
-
A folder
papers
was created within "output_test" where both the.XML
and.pdf
of 50 papers were downloaded. The corresponding pickle files were also created for each paper. -
.csv
file with PMC id, HTML link, pdf link, title, and the author info was created in papers -
no errors