Skip to content

Installation of pygetpapers

Vasant Kumar edited this page Mar 18, 2021 · 12 revisions

How to install pygetpapers on Windows?

Tester 1: Talha Hasan

OS: Windows 10

Date: 03/17/2021

Prerequisites

Open command promt

check for pip by giving command pip

You've configured pip if you get the following output:

Usage:
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  cache                       Inspect and manage pip's wheel cache.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  debug                       Show information useful for debugging.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to
                              WARNING, ERROR, and CRITICAL logging levels).
  --log <path>                Path to a verbose appending log.
  --no-input                  Disable prompting for input.
  --proxy <proxy>             Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries <retries>         Maximum number of retries each connection should attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup,
                              (a)bort.
  --trusted-host <hostname>   Mark this host or host:port pair as trusted, even though it does not have valid or any
                              HTTPS.
  --cert <path>               Path to alternate CA bundle.
  --client-cert <path>        Path to SSL client certificate, a single file containing the private key and the
                              certificate in PEM format.
  --cache-dir <dir>           Store the cache data in <dir>.
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine whether a new version of pip is available for
                              download. Implied with --no-index.
  --no-color                  Suppress colored output
  --no-python-version-warning
                              Silence deprecation warnings for upcoming unsupported Pythons.
  --use-feature <feature>     Enable new functionality, that may be backward incompatible.
  --use-deprecated <feature>  Enable deprecated functionality, that will be removed in the future.

download pygetpapers |

give a single command python -m pip install git+git://github.com/petermr/pygetpapers or -m pip install --upgrade pip' at command prompt You've successfully configured if you get the following output:

Defaulting to user installation because normal site-packages is not writeable
Collecting git+git://github.com/petermr/pygetpapers
  Cloning git://github.com/petermr/pygetpapers to c:\users\talha\appdata\local\temp\pip-req-build-436dy2hn
Collecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 163 kB/s
Collecting pandas_read_xml
  Downloading pandas_read_xml-0.0.9-py3-none-any.whl (6.2 kB)
Collecting pandas
  Downloading pandas-1.2.3-cp39-cp39-win_amd64.whl (9.3 MB)
     |████████████████████████████████| 9.3 MB 469 kB/s
Collecting lxml
  Downloading lxml-4.6.2-cp39-cp39-win_amd64.whl (3.5 MB)
     |████████████████████████████████| 3.5 MB 187 kB/s
Collecting chromedriver_autoinstaller
  Downloading chromedriver_autoinstaller-0.2.2-py3-none-any.whl (5.9 kB)
Collecting xmltodict
  Downloading xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Collecting selenium
  Downloading selenium-3.141.0-py2.py3-none-any.whl (904 kB)
     |████████████████████████████████| 904 kB 123 kB/s
Collecting certifi>=2017.4.17
  Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 204 kB/s
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 273 kB/s
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
     |████████████████████████████████| 153 kB 312 kB/s
Collecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 297 kB/s
Collecting pyarrow
  Downloading pyarrow-3.0.0-cp39-cp39-win_amd64.whl (12.6 MB)
     |████████████████████████████████| 12.6 MB 285 kB/s
Collecting zipfile36
  Downloading zipfile36-0.1.3-py3-none-any.whl (20 kB)
Collecting distlib
  Downloading distlib-0.3.1-py2.py3-none-any.whl (335 kB)
     |████████████████████████████████| 335 kB 80 kB/s
Collecting pytz>=2017.3
  Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
     |████████████████████████████████| 510 kB 344 kB/s
Collecting python-dateutil>=2.7.3
  Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
     |████████████████████████████████| 227 kB 364 kB/s
Collecting numpy>=1.16.5
  Downloading numpy-1.20.1-cp39-cp39-win_amd64.whl (13.7 MB)
     |████████████████████████████████| 13.7 MB 344 kB/s
Collecting six>=1.5
  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Using legacy 'setup.py install' for pygetpapers, since package 'wheel' is not installed.
Installing collected packages: certifi, idna, urllib3, chardet, requests, numpy, pyarrow, xmltodict, zipfile36, pytz, six, python-dateutil, pandas, distlib, pandas-read-xml, lxml, chromedriver-autoinstaller, selenium, pygetpapers
  WARNING: The script chardetect.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script f2py.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script plasma_store.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script chromedriver-path.exe is installed in 'C:\Users\talha\AppData\Roaming\Python\Python39\Scripts' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
    Running setup.py install for pygetpapers ... done
Successfully installed certifi-2020.12.5 chardet-4.0.0 chromedriver-autoinstaller-0.2.2 distlib-0.3.1 idna-2.10 lxml-4.6.2 numpy-1.20.1 pandas-1.2.3 pandas-read-xml-0.0.9 pyarrow-3.0.0 pygetpapers-0.0.1 python-dateutil-2.8.1 pytz-2021.1 requests-2.25.1 selenium-3.141.0 six-1.15.0 urllib3-1.26.4 xmltodict-0.12.0 zipfile36-0.1.3
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'C:\Program Files\Python39\python.exe -m pip install --upgrade pip' command.

Note: The warnings in the above output can be ignored

PMR> Good Have you tested it works?

Tester 2: Vasant Kumar

OS: Windows 10

Date: 03/17/2021

Prerequisites

  • Python pre installed

Clone repository

Install pygetpapers by running command on commandline python -m pip install git+git://github.com/petermr/pygetpapers

Result

C:\Users\vasan>pip3 install git+git://github.com/ayush4921/pygetpapers.git
Collecting git+git://github.com/ayush4921/pygetpapers.git
  Cloning git://github.com/ayush4921/pygetpapers.git to c:\users\vasan\appdata\local\temp\pip-req-build-4lhdpcvz
Collecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 170 kB/s
Requirement already satisfied: pandas_read_xml in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (0.2.2)
Collecting xmltodict
  Using cached xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Requirement already satisfied: selenium in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pipreqs==0.0.1) (3.12.0)
Collecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 285 kB/s
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
     |████████████████████████████████| 153 kB 819 kB/s
Collecting certifi>=2017.4.17
  Using cached certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 896 kB/s
Collecting zipfile36
  Using cached zipfile36-0.1.3-py3-none-any.whl (20 kB)
Collecting distlib
  Using cached distlib-0.3.1-py2.py3-none-any.whl (335 kB)
Requirement already satisfied: pyarrow in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas_read_xml->pipreqs==0.0.1) (3.0.0)
Collecting numpy>=1.16.5
  Using cached numpy-1.20.1-cp39-cp39-win_amd64.whl (13.7 MB)
Requirement already satisfied: pytz>=2017.3 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas->pipreqs==0.0.1) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from pandas->pipreqs==0.0.1) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\vasan\appdata\roaming\python\python39\site-packages (from python-dateutil>=2.7.3->pandas->pipreqs==0.0.1) (1.15.0)
Using legacy 'setup.py install' for pipreqs, since package 'wheel' is not installed.
Installing collected packages: chardet, urllib3, certifi, idna, requests, xmltodict, pipreqs, zipfile36, distlib, numpy
    Running setup.py install for pipreqs ... done
Successfully installed certifi-2020.12.5 chardet-4.0.0 distlib-0.3.1 idna-2.10 numpy-1.20.1 pipreqs-0.0.1 requests-2.25.1 urllib3-1.26.4 xmltodict-0.12.0 zipfile36-0.1.3
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'c:\users\vasan\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.

Check your installation using pygetpapers --help on commandline.

PMR> This is wrong (use --help)

C:\Users\vasan>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-p FROMPICKLE] [-m] [-j] [-c] [-u UPDATE]
                   [--api | --webscraping] [--onlyresearcharticles | --onlypreprints | --onlyreviews]

Welcome to Pygetpapers. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        Add the query you want to search for. Enclose the query in quotes.
  -k LIMIT, --limit LIMIT
                        Add the number of papers you want. Default =100
  -o OUTPUT, --output OUTPUT
                        Add the output directory url. Default is the current working directory
  -v, --onlyquery       Only makes the query and stores the result.
  -p FROMPICKLE, --frompickle FROMPICKLE
                        Reads the picke and makes the xml files. Takes the path to the pickle as the input
  -m, --makepdf         Also makes pdf files for the papers. Works only with --api method.
  -j, --makejson        Also makes json files for the papers. Works only with --api method.
  -c, --makecsv         Also makes csv files for the papers. Works only with --api method.
  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Requires -k or --limit and -q or --query to be
                        given. Takes the path to the pickle as the input
  --api                 Get papers using the official EuropePMC api
  --webscraping         Get papers using the scraping EuropePMC. Also supports getting only research papers, preprints
                        or review papers.
  --onlyresearcharticles
                        Get only research papers (Only works with --webscraping)
  --onlypreprints       Get only preprints (Only works with --webscraping)
  --onlyreviews         Get only review papers (Only works with --webscraping)

Tester 3: Kanishka Parashar

OS: Windows 10

Date: 17/03/2021

Open command promt

Clone repositories: https://github.com/ayush4921/pygetpapers by giving git clone command in cmd

Check for pip by giving command 'pip'

You will see:

Usage:
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  cache                       Inspect and manage pip's wheel cache.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  debug                       Show information useful for debugging.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to
                              WARNING, ERROR, and CRITICAL logging levels).
  --log <path>                Path to a verbose appending log.
  --no-input                  Disable prompting for input.
  --proxy <proxy>             Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries <retries>         Maximum number of retries each connection should attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup,
                              (a)bort.
  --trusted-host <hostname>   Mark this host or host:port pair as trusted, even though it does not have valid or any
                              HTTPS.
  --cert <path>               Path to alternate CA bundle.
  --client-cert <path>        Path to SSL client certificate, a single file containing the private key and the
                              certificate in PEM format.
  --cache-dir <dir>           Store the cache data in <dir>.
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine whether a new version of pip is available for
                              download. Implied with --no-index.
  --no-color                  Suppress colored output
  --no-python-version-warning
                              Silence deprecation warnings for upcoming unsupported Pythons.
  --use-feature <feature>     Enable new functionality, that may be backward incompatible.
  --use-deprecated <feature>  Enable deprecated functionality, that will be removed in the future.

Download pygetpapers by command: python -m pip install git+git://github.com/petermr/pygetpapers

This will appear:

C:\Users\HP PC>python -m pip install git+git://github.com/petermr/pygetpapers
Collecting git+git://github.com/petermr/pygetpapers
  Cloning git://github.com/petermr/pygetpapers to c:\users\hp pc\appdata\local\temp\pip-req-build-6zl7oaza
Requirement already satisfied: requests in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (2.20.0)
Requirement already satisfied: pandas_read_xml in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pygetpapers==0.0.1) (3.12.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (2020.12.5)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (1.24.3)
Requirement already satisfied: idna<2.8,>=2.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from requests->pygetpapers==0.0.1) (2.7)
Requirement already satisfied: zipfile36 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.1) (0.1.3)
Requirement already satisfied: pyarrow in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.1) (3.0.0)
Requirement already satisfied: distlib in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pygetpapers==0.0.1) (0.3.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.1) (1.20.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.1) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from pandas->pygetpapers==0.0.1) (2021.1)
Requirement already satisfied: six>=1.5 in c:\users\hp pc\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7.3->pandas->pygetpapers==0.0.1) (1.15.0)
Using legacy 'setup.py install' for pygetpapers, since package 'wheel' is not installed.
Installing collected packages: pygetpapers
    Running setup.py install for pygetpapers ... done
Successfully installed pygetpapers-0.0.1
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the 'C:\Users\HP PC\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.

Check for installation of pygetpapers by giving command 'pygetpapers --help' in commandline.

C:\Users\HP PC>pygetpapers --help

usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-f FROMPICKLE] [-p] [-j] [-c] [-u UPDATE]

Welcome to Pygetpapers. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
                        escape special characters within the quotes, use backslash. The query to be quoted in either
                        single or double quotes.
  -k LIMIT, --limit LIMIT
                        maximum number of hits (default: 100)
  -o OUTPUT, --output OUTPUT
                        output directory (Default: current working directory)
  -v, --onlyquery       Saves pickle file containing the result of the query in storage. The pickle file can be given
                        to --frompickle to download the papers later.
  -f FROMPICKLE, --frompickle FROMPICKLE
                        Reads the picke and makes the xml files. Takes the path to the pickle as the input
  -p, --downloadpdf     Downloads full text pdf files for the papers. Works only with --api method.
  -j, --makejson        Stores the per-document metadata as json. Works only with --api method.
  -c, --makecsv         Stores the per-document metadata as csv. Works only with --api method.
  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Requires -k or --limit (If not provided, default
                        will be used) and -q or --query (must be provided) to be given. Takes the path to the pickle
                        as the input.

Tester 4: Radhu Ladani

OS: Windows 10

Prerequisites

  • Python installation on local computer.
  • Open commandline and changed the working directory using cd.
  • Cloned the repository using git clone command to the local computer: git clone https://github.com/petermr/pygetpapers.
  • This single command will install pygetpapers pip3 install git+git://github.com/ayush4921/pygetpapers.git.
C:\Users\DELL\Radhu>pip3 install git+git://github.com/ayush4921/pygetpapers.git
Collecting git+git://github.com/ayush4921/pygetpapers.git
  Cloning git://github.com/ayush4921/pygetpapers.git to c:\users\dell\appdata\local\temp\pip-req-build-3c6tx2zj
  Running command git clone -q git://github.com/ayush4921/pygetpapers.git 'C:\Users\DELL\AppData\Local\Temp\pip-req-build-3c6tx2zj'
Requirement already satisfied: requests in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (2.20.0)
Requirement already satisfied: pandas_read_xml in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (0.0.9)
Requirement already satisfied: pandas in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (1.2.0)
Requirement already satisfied: lxml in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (4.6.2)
Requirement already satisfied: chromedriver_autoinstaller in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (0.2.2)
Requirement already satisfied: xmltodict in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (0.12.0)
Requirement already satisfied: selenium in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pipreqs==0.0.1) (3.12.0)
Requirement already satisfied: numpy>=1.16.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pipreqs==0.0.1) (1.20.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pipreqs==0.0.1) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas->pipreqs==0.0.1) (2.8.1)
Requirement already satisfied: six>=1.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from python-dateutil>=2.7.3->pandas->pipreqs==0.0.1) (1.15.0)
Requirement already satisfied: distlib in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pipreqs==0.0.1) (0.3.1)
Requirement already satisfied: zipfile36 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pipreqs==0.0.1) (0.1.3)
Requirement already satisfied: pyarrow in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from pandas_read_xml->pipreqs==0.0.1) (3.0.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (2020.12.5)
Requirement already satisfied: idna<2.8,>=2.5 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (2.7)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (1.24.3)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\dell\appdata\local\programs\python\python39\lib\site-packages (from requests->pipreqs==0.0.1) (3.0.4)

Running pygetpapers on the cmd

  • Run pygetpapers --help
Output
C:\Users\DELL\Radhu>pygetpapers --help
usage: pygetpapers [-h] [-q QUERY] [-k LIMIT] [-o OUTPUT] [-v] [-f FROMPICKLE] [-p] [-j] [-c] [-u UPDATE]

Welcome to Pygetpapers. -h or --help for help

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        query string transmitted to repository API. Eg. 'Artificial Intelligence' or 'Plant Parts'. To
                        escape special characters within the quotes, use backslash. The query to be quoted in either
                        single or double quotes.
  -k LIMIT, --limit LIMIT
                        maximum number of hits (default: 100)
  -o OUTPUT, --output OUTPUT
                        output directory (Default: current working directory)
  -v, --onlyquery       Saves pickle file containing the result of the query in storage. The pickle file can be given
                        to --frompickle to download the papers later.
  -f FROMPICKLE, --frompickle FROMPICKLE
                        Reads the picke and makes the xml files. Takes the path to the pickle as the input
  -p, --downloadpdf     Downloads full text pdf files for the papers. Works only with --api method.
  -j, --makejson        Stores the per-document metadata as json. Works only with --api method.
  -c, --makecsv         Stores the per-document metadata as csv. Works only with --api method.
  -u UPDATE, --update UPDATE
                        Updates the corpus by downloading new papers. Requires -k or --limit (If not provided, default
                        will be used) and -q or --query (must be provided) to be given. Takes the path to the pickle
                        as the input.
  • Example query: Pygetpapers -q "Medicinal Activities" -k 50 -p -o output_test

  • This query created "output_test" in the current directory which we define in the path.

  • A folder papers was created within "output_test" where both the .XML and .pdf of 50 papers were downloaded. The corresponding pickle files were also created for each paper.

  • .csv file with PMC id, HTML link, pdf link, title, and the author info was created in papers

  • no errors

Clone this wiki locally