Objectives
All members (e.g.., graduate students; postdocs, research associates) of the Bayne Lab are expected to use Github to share and archive code and project-specific files. The 2 objectives of using Github as a lab are:
- Project-specific archiving: for open science, collaboration, and reproducibility
- Example script sharing: To share code internally for commonly used applications so that students are not duplicating efforts
Why use GitHub?
Source: “Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com
The primary reason to use GitHub is for Version Control, which allows you to track and store changes to your files without losing the history of your past changes. Version control is important not only when writing manuscripts, but as a way to develop analyses while retaining earlier versions and documenting decicion points and work flow.
Using Github also teaches students best practices for creating reproducible analyses in a collaborative framework; a skill that is increasingly applicable across varying careers in science.
There are many online tutorials on how to get started using GitHub with R and R Projects.
Everyone is encouraged to use Open Science Foundation (OSF) to develop projects, and link the OSF project to Github.
OSF Video Tutorials
Github Structure Overview
The Bayne Lab Github organization will be the unifying organization for all Github code sharing.
- Each student or postdoc should have their own personal Github account.
- Repos should be owned by the account of the student or postdoc who is first author on that project.
- All students will be added to members of the Bayne Lab organization.
- All repos including Bayne Lab work should be mirrored to the ‘baynelab-research’ organization.
- Data will have to be stored elsewhere, with either open access or proprietary links written into the code.
Each repo should contain:
- A unifying project structure to facilitate easy reproducibility
- All scripts used to run the analyses for that project
- Links to project or task-specific data. This includes all synthesized or processed data, including:
- Shared data that is housed elsewhere and can be accessed via code in your script.
- Raw data that requires processing including acoustic and most geospatial data.
- Large datasets housed elsewhere and accessed via code in your script. See Github guidelines on file size here.
The two objectives (example code sharing, project code archiving) have differing repo structure and code storage rules, detailed below.
Basic Ropository Requirements
Read Me
Each repo must have a README.md that outlines:
- The purpose of the repository
- A descripton of the file structure
- Instructions for reproducing the analyis