FAQs

Frequently Asked Questions for `pygetpapers`

What can `pygetpapers` do?

Search repository or publishers sites for scholarly articles.
Iteratively improves queries from dictionaries and previous searches.
provide a unified system to cover many different sites.
integrate with downstream content-mining and analysis.

Can `pygetpapers` search repository "FOO"?

pygetpapers is modular and designed for RESTful APIs. It has modules for EuropePMC (EPMC) (fulltext), preprint servers: arXiv, biorxiv, medrxiv, rkivist and metadata server: crossref.
if you are familiar with the content and manual search it is relatively easy to add code for a new RESTFul repository. Note that the socio-legal aspects are often critical (copyright, server load, etc.)

Where is my data stored?

pygetpapers stores all data (fulltexts, metadata, analyses, etc.) on your machine, wherever you choose.

Do I have to know python?

No. Currently you have to install Python but there are simple tested commands for this. Later we may package everything as docker or Jupyter Notebooks

Does `pygetpapers` store a record of my searches?

Not by default. There is an optional LOGfile which stores the query and records downloads. We are working on integrating pygetpapers into Jupyter Notebooks so complex workflows can be re-run.

Can `pygetpapers` be run as a server?

It is not currently packaged as a server (although this shouldn't be difficult), but we are exploring Cloud solutions such as Binder or Google Colab.

What resource problems does `pygetpapers` have?

pygetpapers is generally embarrassingly parallel. The main resource is bandwidth and remote server capacity. Several jobs can be run simultaneously , e.g. by division by publication-date slices. The main concern is not to overload the remote server, creating a denial of service so be careful.
downloaded files can be quite large (e.g. 20+MB PDFs) so 10_000 files might take 50 GB.
malformed queries in getpapers could cause problems; not sure if this is true for pygetpapers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQs

Frequently Asked Questions for `pygetpapers`

What can `pygetpapers` do?

Can `pygetpapers` search repository "FOO"?

Where is my data stored?

Do I have to know python?

Does `pygetpapers` store a record of my searches?

Can `pygetpapers` be run as a server?

What resource problems does `pygetpapers` have?

Clone this wiki locally

FAQs

Frequently Asked Questions for pygetpapers

What can pygetpapers do?

Can pygetpapers search repository "FOO"?

Where is my data stored?

Do I have to know python?

Does pygetpapers store a record of my searches?

Can pygetpapers be run as a server?

What resource problems does pygetpapers have?

Clone this wiki locally

Frequently Asked Questions for `pygetpapers`

What can `pygetpapers` do?

Can `pygetpapers` search repository "FOO"?

Does `pygetpapers` store a record of my searches?

Can `pygetpapers` be run as a server?

What resource problems does `pygetpapers` have?