2021-2022-la-chouffe-code

The repository for the team La Chouffe of the Open Science course a.a. 2021/2022

Information about the Project

Data Management Plan

Venditti Giulia, Catizone Chiara, & Brembilla Davide. (2022). La Chouffe - Data Management Plan (0.0.3). Zenodo. https://doi.org/10.5281/zenodo.6570286

Protocol introducing the methodology

Davide Brembilla, Chiara Catizone, & Giulia Venditti. (2022). PROTOCOL – Availability of Open Access Metadata from Open Journals – A case study in DOAJ and Crossref V.4. Protocol. protocols.io. https://doi.org/10.17504/protocols.io.kxygxz7ywv8j/v4

Software developed

GiuliaVenditti, dbrembilla, ChiaraCati, & Silvio Peroni. (2022). open-sci/2021-2022-la-chouffe-code: v.0.0.1 (prerelease). Zenodo. https://doi.org/10.5281/zenodo.6857310

Data Gathered

Davide Brembilla, Chiara Catizone, & Giulia Venditti. (2022). La Chouffe Dataset (0.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6562909

Article Presenting the Research

Davide Brembilla, Chiara Catizone & Giulia Venditti. (2022). Availability of Article Metadata from Open Journals – A case study in DOAJ and Crossref. https://doi.org/10.5281/zenodo.6570290

Slides supporting the presentation

Chiara Catizone, Davide Brembilla, & Giulia Venditti. (2022, May 25). Presentation La Chouffe team. Zenodo. https://doi.org/10.5281/zenodo.6579263

Software requirements

Tested on Python > 3.9.

requests==2.27.1 requests-cache == 0.9.4 tqdm==4.62.3 backoff==2.0.1 pandas == 1.4.2

You can install these with pip install -r requirements.txt

Launching the software

To use this software you can use from the command line you need first to download both the journals' and the articles' dump from the DOAJ.

Specifics of the computer used for the Estimated Time Allocated (ETA) values:

Laptop Lenovo Ideapad 5
Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz 8 core
8 GB RAM
Windows 10 64 bit

These are the commands used in order to create the final dump:

py -m batches_cleaner "path/to/articles/dump" ETA: 30s

py -m main "cleaned" ETA: about 1h per batch (In our case: ca. 78h)

py -m stats "temp/completed" ETA: 5m

py -m journal_cleaner "path/to/journal/dump" ETA: 1m

py -m populator "stats" ETA: 1,30h

In the end, the pickle file was created through the Python interpreter:

py #open the python shell
import pandas as pd from stats import get_all_in_dir dir = get_all_in_dir('results','csv') df = pd.concat([pd.read_csv(file, encoding='utf8') for file in dir]) df.to_pickle('result.pkl') ETA: 10m

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
images		images
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
answers.md		answers.md
batches_cleaner.py		batches_cleaner.py
cleanJournalsDump.json		cleanJournalsDump.json
compressed_dataset.tar.gz		compressed_dataset.tar.gz
data_viz.ipynb		data_viz.ipynb
journal_cleaner.py		journal_cleaner.py
journalsDump.json		journalsDump.json
main.py		main.py
metadata.ttl		metadata.ttl
metadata_sw.ttl		metadata_sw.ttl
multithread_cache.sqlite		multithread_cache.sqlite
multithread_populating.py		multithread_populating.py
populator.py		populator.py
requirements.txt		requirements.txt
stats.py		stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2021-2022-la-chouffe-code

Information about the Project

Data Management Plan

Protocol introducing the methodology

Software developed

Data Gathered

Article Presenting the Research

Slides supporting the presentation

Software requirements

Launching the software

About

Releases 2

Packages

Contributors 4

Languages

License

open-sci/2021-2022-la-chouffe-code

Folders and files

Latest commit

History

Repository files navigation

2021-2022-la-chouffe-code

Information about the Project

Data Management Plan

Protocol introducing the methodology

Software developed

Data Gathered

Article Presenting the Research

Slides supporting the presentation

Software requirements

Launching the software

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages