-
Notifications
You must be signed in to change notification settings - Fork 8
/
index.Rmd
87 lines (54 loc) · 5.38 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: "Git & GitHub Tutorial for Scientists: \nIt's Not Only for Programmers"
author: "Micaela Chan, Ekarin Pongpipat, and Luke Moraglia"
site: bookdown::bookdown_site
documentclass: book
url: 'https\://gitbookdown.dallasdatascience.com/'
github-repo: mychan24/git_github_bookdown
---
# Preface {-}
## What is this Git/GitHub Tutorial about? {-}
The goal of this tutorial is to help scientists with no formal programming background to **(1) start using [git](https://git-scm.com/) locally for [version-control](https://en.wikipedia.org/wiki/Version_control) of your code**, and **(2) begin using [GitHub](https://github.com) to share your code and collaborate with others.**
Git for version control | GitHub for sharing your code
:-------------------------:|:-------------------------:
![](img/Git-Logo-2Color.png) | ![](img/GitHub_Logo.png)
Note: If we try to wait until we have perfect codes to share, more likely than not, we will end up never sharing them.
## Why is git important for scientists? {-}
Git facilitates (1) **documentation**, and (2) **sharing/collaborating**. Both of these are important in science.
### I. Version-control for code = the lab notebook of experiments{-}
In scientific experiments, we are trained to record every parameter that we modified and tested, as it ensures consistency within an experiment and facilitates reproducible results. While we often record the experimental part of our experiments meticulously, the preprocessing and analyzing of data are often ‘assumed’ to be recorded in our scripts.
Here are a few scenarios where we wish a detailed history of how codes were developed is available:
#### Scenario A: Things were working, now they are not!{-}
You have 5 people collaborating on a single script. That script now results in an error after someone changed a few lines over the weekend. They have no recollection what was changed.
¯\\\_(._.)\_/¯
#### Scenario B: Why were changes made? {-}
You joined a new lab, and were given a pipeline/script to modify that dated years back. The script is named cool_script_version25000.sh, with few comments. Whoever made it has quit academia and is traveling the world, soul searching, and won’t answer emails.
¯\\\_(._.)\_/¯
#### Scenario C: We made changes a long time ago… {-}
Collaborator BigName wants to know what preprocessing parameters were used in your manuscript from 5 years ago. You have since changed multiple parameters in your default processing pipeline.
¯\\\_(._.)\_/¯
![](https://phdcomics.com/comics/archive/phd101212s.gif "Final.doc")
Saving something as Final.doc is bad...but Final_script.R is just as bad (Image Credit: [PhD Comic](https://phdcomics.com/comics/archive.php?comicid=1531))
### II. Git and Remote Hosts (e.g., GitHub) makes sharing/collaborating easier {-}
#### Share codes {-}
The traditional method of sharing scientific results, manuscripts, is a difficult form to share scripts. For example, while you can share the general processes of how things are done, but the actual 1000+ lines scripts are usually not printed.
Remote Hosts such as GitHub (or other alternatives) makes sharing your codes very easy. It’s a link. People can choose to follow your code, and if you update it, they get a notification.
#### Facilitates new collaboration {-}
If you found a way to speed up some toolbox, Git/GitHub helps you to suggest this change to the author of the toolbox, even if you don’t know each other.
#### Encourages open source and open science {-}
[Open source](https://en.wikipedia.org/wiki/Open-source_software) is typically a term used in software development, where it means the source code are open to the public. People can freely look at how a program was written, make improvements with them, etc.
In science, the idea of open source is closely linked with [open science](https://en.wikipedia.org/wiki/Open_science), a movement to make data, samples, software and all things related to a scientific finding as transparent and easy to access as possible. There are pros and cons to an open and closed system, which is not a topic that can be adequately covered here (for more info, checkout [Center for Open Science](https://cos.io/)).
However, given a common goal of wanting more people to understand, reproduce, and build on top of previous work, an **'open’** approach to your code is certainly a good place to start. Git/GitHub will help facilitate your code being shared publicly. It also enables you to easily contribute to other projects, and incorporate ideas and contribution from others in a systematic way (even strangers!).
## Acknowledgment {-}
The creation of this tutorial would not be possible without [Software Carpentry's git novice course](https://swcarpentry.github.io/git-novice/). Check them out!
## Other helpful links {-}
* [GitHub Education](https://education.github.com/) provides additional features to students & researchers for free (e.g., Pro account with unlimited collaborators)!
* [Git the Simple Guide](https://rogerdudler.github.io/git-guide/) provides all the necessary commands to use git.
## How to read the book? {-}
All codes/commands and its output are in gray boxes. However, the difference is that the output will always immediately follow the codes/commands and will start with `##`.
For example, if we wanted to run the command `ls` to see what files are in the current directory.
Below is the code
```{bash}
ls
```
Above is the output