Skip to content

The repository for the team La Chouffe of the Open Science course a.a. 2021/20212

License

Notifications You must be signed in to change notification settings

open-sci/2021-2022-la-chouffe-code

Repository files navigation

2021-2022-la-chouffe-code

The repository for the team La Chouffe of the Open Science course a.a. 2021/2022

Information about the Project

Data Management Plan

Protocol introducing the methodology

Software developed

Data Gathered

Article Presenting the Research

  • Davide Brembilla, Chiara Catizone & Giulia Venditti. (2022). Availability of Article Metadata from Open Journals – A case study in DOAJ and Crossref. https://doi.org/10.5281/zenodo.6570290

Slides supporting the presentation

Software requirements

Tested on Python > 3.9.

requests==2.27.1 requests-cache == 0.9.4 tqdm==4.62.3 backoff==2.0.1 pandas == 1.4.2

You can install these with pip install -r requirements.txt

Launching the software

To use this software you can use from the command line you need first to download both the journals' and the articles' dump from the DOAJ.

Specifics of the computer used for the Estimated Time Allocated (ETA) values:

  • Laptop Lenovo Ideapad 5
  • Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz 8 core
  • 8 GB RAM
  • Windows 10 64 bit

These are the commands used in order to create the final dump:

py -m batches_cleaner "path/to/articles/dump" ETA: 30s

py -m main "cleaned" ETA: about 1h per batch (In our case: ca. 78h)

py -m stats "temp/completed" ETA: 5m

py -m journal_cleaner "path/to/journal/dump" ETA: 1m

py -m populator "stats" ETA: 1,30h

In the end, the pickle file was created through the Python interpreter:

py #open the python shell
import pandas as pd from stats import get_all_in_dir dir = get_all_in_dir('results','csv') df = pd.concat([pd.read_csv(file, encoding='utf8') for file in dir]) df.to_pickle('result.pkl') ETA: 10m

About

The repository for the team La Chouffe of the Open Science course a.a. 2021/20212

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •