Research compendium for a dataset about corresponding author country affiliations indexed in the Web of Science 2014 - 2018
This repository provides a dataset about corresponding author country affiliations indexed in the Web of Science 2014 - 2018 using the in-house database from the German Competence Center for Bibliometrics.
To start with, read the Data Descriptor, an overview of the dataset and analytical work.
This repository is organized as a research compendium. A research compendium contains data, code, and text associated with it. The R Markdown files in the analysis/
directory provide details about the data analysis, particularly about how the Web of Science in-house database from the German Competence Center for Bibliometrics was interfaced, and the Data Descriptor. The data/
directory contains all aggregated data. Because of the proprietary nature of the Web of Science, no raw data and no access to the in-house database can be shared.
The analysis/
directory contains the following reports written in R Markdown:
paper.Rmd
- The Data descriptor including two use-cases. Link to the rendered report
Analytical steps for obtaining the data from the Web of Science in-house database maintained by the German Competence Center for Bibliometrics (WoS-KB), and data enriching were also provided as R Markdown reports:
001_kb_rp_pretest.Rmd
- Initial exploration about how corresponding authorships are represented in the WoS-KB. Link to the rendered report002_kb_rp_coverage.Rmd
- Coverage analysis of reprint authors in the WoS-KB for the period 2014 - 2018. Link to the rendered report003_kb_fetch_ca.Rmd
- Steps to obtain and compile the dataset relative to corresponding author country affiliations 2014 - 2018. Link to the rendered report004_kb_fetch_publisher.Rmd
- Steps to obtain and compile the dataset relative to the global publisher output 2014 - 2018. Link to the rendered reportwos_cr_match.Rmd
- Obtain most frequent publisher and journal name by Web of Science indexed journal from Crossref. Executing this script will take approx. 2-3 hours.
The data/
directory contains the resulting datasets stored as comma-separated value files.
rp_publisher_14_18.csv
- Corresponding author country affiliations per publisher, journal and open access publishing model 2014 - 2018journal_publisher_14_18.csv
- Global publisher and journal output 2014 - 2018cr_wos.csv
- Matching table publisher and journal infos between Crossref and the Web of Science20190818.ISSN-to-ISSN-L.txt
- ISSN matching table from https://www.issn.org/understanding-the-issn/assignment-rules/the-issn-l-for-publications-on-multiple-media/geocodes.csv
- Country geocodes obtained from https://developers.google.com/public-data/docs/canonical/countries_csv
This repository follows the concept of a research compendium that uses the R package structure to port data and code.
Because access to the data infrastructure of the German Competence Center for Bibliometrics (WoS-KB) is restricted, there are different levels of reproducibility. Everyone will be able to reproduce the analysis in the data descriptor, the main document of this research compendium written in R Markdown. Users with access to the WoS-KB data infrastructure will also be able to replicate the R code and SQL queries locally, or on the script server.
Clone the GitHub repository with all data and code.
git clone https://github.com/subugoe/oa2020cadata.git
Open an R session in the directory of this package and install the R package dependencies using a package snapshot from the date this package was build
devtools::install_deps(repos = list(CRAN = 'http://mran.revolutionanalytics.com/snapshot/2019-09-08/'))
To replicate the data descriptor:
rmarkdown::render("analysis/paper.Rmd")
Using the holepunch-package the project was made Binder ready. Binder allows you to execute the data descriptor in the cloud in your web browser.
If you have access to the Competence Center of Bibliometrics data infrastructure, you can replicate how data was obtained from the Web of Science. These steps are described in the R Markdown documents, starting with 00
in the analysis/
folder.
To get started, follow the steps described above to download the research compendium including the necessary R packages. Next, add your database login credentials to your .Renviron
file and save it. You can open your .Renviron
file from R with usethis::edit_r_environ()
.
kb_user="najko"
kb_pwd="12345"
Reload your R session.
The Oracle database driver needed to access the remote database is included in this repository.
To replicate the R Markdown documents, call rmarkdown::render()
.
When working on the script-server, please note that you need to render the documents with knitr::knit()
, because pandoc is not available.
Using a docker container with all source code and data needed to reproduce this research compendium would reduce the above-described set-up efforts. Unfortunately, access to the German Competence Center for Bibliometrics requires a VPN tunnel that, at least in my local setup, is not accessible from the Docker container. Furthermore, Docker is not available for users on the script-server.
Re-used data terms:
This work uses Web of Science data by Clarivate Analytics provided by the German Competence Center for Bibliometrics for research purposes.
Crossref asserts no claims of ownership to individual items of bibliographic metadata and associated Digital Object Identifiers (DOIs) acquired through the use of the Crossref Free Services. Individual items of bibliographic metadata and associated DOIs may be cached and incorporated into the user's content and systems.
ISSN-Matching of Gold OA Journals (ISSN-GOLD-OA) 3.0 and Country Geocodes obtained from Google are made available under CC-BY.
The documentation and data descriptor including the figures are made available under CC-BY 4.0.
Source Code: MIT (Najko Jahn, 2019)
in alphabetical order
- Design and Conceptualization: Colleen Campbell, Kai Geschuhn, Najko Jahn
- Data Analysis: Najko Jahn
- Writing and Review: Colleen Campbell, Anne Hobert, Najko Jahn, Birgit Schmidt, Niels Taubert, Anne Hobert
This data analytics works has been developed using open tools. There are a number of ways you can help make it better:
- If you don’t understand something, please let me know and submit an issue.
Feel free to add new features or fix bugs by sending a pull request.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
This work is supported by the Federal Ministry of Education and Research of Germany (BMBF) in the framework Quantitative Research on the Science Sector (Project: "OAUNI Entwicklung und Einflussfaktoren des Open-Access-Publizierens an Universitäten in Deutschland", Förderkennzeichen: 01PU17023A).
Najko Jahn, Data Analyst, SUB Göttingen. najko.jahn@sub.uni-goettingen.de