Skip to content

Contribution Tutorial

Tobey Carman edited this page Sep 26, 2014 · 1 revision

Tutorial on contributing

The dvmdostem software is not distributed as a binary executable. This means that to run the program, you must download the source code and compile the software. In addition, many of the settings you may want to modify for running dvmdostem require re-compiling the code. This makes the distinction between "user" and "developer" a bit blurry. For this reason it is important that everyone learn how to interact with DVM-DOS-TEM as a developer, even if you do not intend to make a [public] modification to the codebase for quite a while.

The process of developing the DVM-DOS-TEM software involves managing the contributions from all the members of the Spatial Ecology Lab and keeping everyone up to date with the latest additions to the code. To improve overall efficiency and productivity of the Lab, it is important that everyone contribute to the same codebase. Over time this will significantly reduce the amount of redundant work that everyone does to perform their research. Contributions to the codebase can be to the scientific aspects of the model, or simply improving the documentation or structure of the codebase so that it is easier to work with. This tutorial will take you through the process of making a small contribution to the documentation. At the end you will:

  • Have made a meaningful addition to the the DVM-DOS-TEM codebase and will be prepared to contribute more in the future.
  • You will know how to keep your copy of the software (source code) up to date with the "upstream" (Spatial Ecology Lab) copy of the software.

While the actual steps of this tutorial might only take an hour or so to complete, depending on your background with Linux and Git, you might expect to spend several days working through the entire exercise as you explore some topics in more detail.

Setting up your computing environment

The usual deployment target for DVM-DOS-TEM is a Linux cluster computer. While the code can be compiled and run on a Mac, and possibly even a Windows machine, you might save yourself headaches in the future by simply learning to use Linux. Linux makes a great scientific computing environment, so while daunting in the beginning, your effort will pay off in the long run. Take some time to do this - you are building yourself a good platform for all your future work.

Fortunately with modern tools, it is possible to easily set up a "virtual machine" that runs Linux, so you can use Linux "inside" whatever your normal computing environment is. There are several virtualization software packages available. You may have heard of VMWare, Parallels, or Virtual Box.

If you choose to use a virtual machine, start by downloading and installing a virtualization software. Virtual Box is free and highly recommended. Next choose a Linux distribution to use and install as a "guest" system within your virtualization software. Fedora is a recommended distribution because it is in the same family of distributions as CentOS and RedHat Linux, which are the distributions likely to be used in larger production environments. Alternatively, install the Linux distribution of your choice on a dedicated host computer.

Next, become familiar with the basics of operating your new Linux computer via the command line (Terminal).

Take some time to familiarize yourself with:

  • The difference between realtive paths and absolute paths.
  • Hidden files, and a user's "dotfiles" to store personal settings (i.e. .bashrc).
  • The concepts of "users" in a Linux environment, and what the distinciton is between a normal user and the "root" user.
  • How command line help is typically presented (i.e. a general prompt followed by a command).
  • How to use the man pages to find out more about different Linux commands.

Take some time to learn about the "package manager" and how to install and update software on your new Linux computer.

TODO: add some good into linux ref. material

Version Control Tools

Getting familiar with Git, Github, and SEL's use of these tools.

Managing contributions from a wide group of people to a single codebase is a difficult task. To help this, we use a version control system (VCS) called "Git". Git is absolutely fundamental to the process of working with DVM-DOS-TEM. You must understand the basics of this tool. We are hosting the DVM-DOS-TEM codebase with a service called Github. You must understand the difference between Git and Github.

It is important to learn the fundamentals of Git for personal development before you can contribute to a group repository. Take a couple hours to read through the first few chapters of the "Git Book". Perhaps make some simple repositories so you can test the operations. At the outset, don't worry about pushing, pulling and remote repositories. Focus on the basics and make sure you understand the concepts of the "working directory", the "staging area", and the "repository".

Read the following articles to become acquainted with Git:

TODO: find links to some good basic git info...

TODO: find somethign about branching models...

TODO: find basic info describing integration manager/director/lieutenants

TODO: find good description about difference between git and github.

TODO: something about markdown, and why we use markdown in this file...

TODO: add something about why it is important to keep personal / platform specific / absolute paths out of the codebase...

TODO: add link to this awesome tutorial: http://pcottle.github.io/learnGitBranching/

Once you have some basic familiarity with Git and Github, you should understand how and why you must use the following procedures to get your copy of DVM-DOS-TEM.

The following procedure will allow you to:

  • Stay up to date with other developments that happen to the codebase.
  • Add your own contribution to the codebase.
  • Keep the codebase on different computers that you use and keep each computer up to date with the other computers and the main SEL codebase.

So with that out of the way, lets get going.

Actually getting a copy of DVM-DOS-TEM

First:

  • You need to create a (free) account with Github.
  • You need to ask the administrator of the SEL Github account to add you to the SEL workgroup so that you can view the repositories.

Now sign into Github. Spend some time familiarizing yourself with the interface. Explore some repositories.

If you have been added to the SEL Github team, you should be able to see a variety of SEL owned repositories in your Github account. Pay particular attention to which repository you are viewing in the Github account (i.e. yours, vs. the SEL "fork" of the code). Pay attention to the URL in the address bar of your browser - it is usually quite clear and a good indication of what repository you are actually looking at.

Next, find the "sel-help" repository on Github and look for the "github-flowchart" PDF files.

Download and read these documents and decide the way in which you will plan on interacting with the code. The .pdfs are designed to be printed in two normal sized sheets of paper and hung somewhere convenient for your quick reference.

The remainder of this tutorial will take you through actually contributing to the source code (well actually the documentation - which you will discover should lives with(in) the source code), so make sure you find and understand that path through the flow chart that supports making a contribution. The steps in this tutorial should follow that path on the flowchart.

For DVM-DOS-TEM we are using the "Integration Manager" workflow. This is also sometimes referred to as the "Fork and Pull" model. We do not have a dedicated person who serves as the Integration Manager. We all have the rights to merge code into the trunk, but the workflow encourages review of changes by the group before they are merged into the trunk.

TODO: find link to description of fork/pull and or integration manager.

If you have not done so already, create your own "fork" of the codebase. This operation happens on Github. Look for the button on the Github website. The result is that your Github account now contains a copy of the DVM-DOS-TEM codebase. You should be able to identify this based on several cues in the Github web interface, not the least of which is the URL in your browser's address bar.

Next, you need to get a copy of the code onto your local computer where you plan to work. (likely your new virtual machine, but this could also be a dedicated Linux computer, or perhaps your account on one of the UAF cluster computers (aeshna, or atlas).

  • Navigate to a location on your Linux computer where you would like to store the model.
  • On Github, navigate to your fork of the model.
  • On Github, find the clone address for the code. Either the HTTP, or SSH address will work, but in the long run, it will be worth your while to make the SSH work because it allows secure access without having to constantly type in your password. However SSH can be configured later. For now, choose one of the methods...
  • Copy the clone address from Github and issue the cloning command on your Linux computer. This will create a directory named "dvm-dos-tem" and will grab all the information from Github (all the source code, all the history, some sample data, and a good chunk of documentation) and download it to your Linux computer. Because we bundle some sample test data with DVM-DOS-TEM, this is actually a fairly large download, so it might take a few seconds.

Congratulations! Now you have DVM-DOS-TEM on your computer!

Now it is time to make a change and submit that change back to the "upstream" codebase for inclusion there.

NOTE: The upstream repository refers to the main, Spatial Ecology Lab repository. Your goal with this tutorial is to make modifications in your codebase and then request that your changes are included in the upstream repository.

An additional goal is to keep your repository up to date with the upstream repository, so that you have the most recent modifications that anyone else has made.

Making your modification...

For this tutorial, you will be making a small modification to improve the documentation of DVM-DOS-TEM.

TODO: add notes on setting up your personal computing environment for working with git.

Specifically:

  • adding to your bashrc file to add the current git branch to your prompt
  • helpful git config settings, (like adding color)
  • gitk, and git gui

First, notice that you are on the master branch. For the SEL maintained dvm-dos-tem, we follow a specific "branching model". The branching model helps keep an clear understandable history for the project. The branching model gives specific semantic value to differnet points in the history tree.

TODO: add link to branching model section

Acoording to the branching model, the master branch might not have the most recent code. We would like our modification to be based off the devel branch, and eventually to be pulled into the SEL repository's devel branch.

NOTE: might be good to add some details about git, remotes, origin, and upstream...

TODO: Determine how to manage development and master branches on individual forks.

TODO: add remote "upstream" to point toward github/ua-snap/dvm-dos-tem?

TODO: add some history about why sel's code is stored at github/ua-snap account...

To do this, first checkout a local devel branch which is based off your fork's devel branch.

$ git checkout --track remotes/origin/devel

Now you have, locally, all the most recent code from your fork's devel branch. Next you need to make a change that you wish to be included in the upstream repo.

Although not strictly necessary, we will further isolate this change in a topic branch. For a small modification, such as the typo we are going to fix, this is a bit of extra complexity, but for a larger changeset, such as a new feature, or addition of a new scientific concept, using a separate topic branch is important.

$ git checkout -b improve-docs

Finally, we are ready to get down to work. Read the documentation (this tutorial, the main README.md, or comments in the code itself) and find an error. This should be easy! Your error can be as small as a typo, or as large as a paragraph that you re-work or decide needs to be added or deleted.

Make the modification using the text editor of your choice....

After you have modified and saved the file, use Git to see if you can get a concise summary of the changes you made (using the command line git diff, or some graphical tool such as Git Gui.

Now you need to commit your change. You can use Git Gui for this or the command line:

$ git add <path/to/the/file/you/just/changed>
$ git commit -m "Update documentation; my first contribution to dvm-dos-tem"

Now the modifacation you have made is a part of your local history. The next steps are to:

  1. Make sure you are up to date with all the developments that have happened in the upstream repository.
  2. Push your changes up to your fork.
  3. Request that your awesome modification is incorporated to the upstream version.

TODO: add some discussion about why you need to make sure you are up to date with the upstream repo (merging locally, handling merge conflicts locally)

Making sure you are up to date...

So to start, make sure you are up to date with the upstream repository's devel branch

$ git fetch upstream/devel
$ git merge upstream/devel

TODO: Address why "git fetch, get merge" vs just "git pull"?

Now if there are any merge conflicts, you will need to fix them. This is pretty unlikely, but possible. A merge conflict would only occur if someone else had changed the same lines of the same file as you in the time frame since you checked it out the file. In that case, you would have to decide whose modification to accept - this process is known as resolving merge conflicts.

After you have addressed (and fixed locally) any conflicts, it is time to push your changes up to your fork on github.

Pushing your changes...

$ git push origin improve-docs

NOTE: Read about Git and remote repositories. You should have cloned from your fork, so the remote origin should point toward your fork, not the upstream repository!

Now if you go back to the Github web site, you should be able to view your modification. Try the "Network View" to see this. You should see a new commit dot with your changes labeled on your fork as improve-docs.

Requesting that your modification be included in the upstream

OK, finally, you are ready to have your modification incorporated into the upstream codebase. On the Github interface, find the button for "Pull Request". Make sure that you have the settings correct: you want to pull the new branch you just created, (improve-docs), into the upstream/devel branch.

OK, you are finished! The next step is that someone in the SEL group (the "Integration Manager") must accept the pull request. Because we all have rights to do this, you can actually do this yourself, but it is a good habit to get feedback from the group before merging code into the upstream repository. To merge the pull request, navigate to the ua-snap/dvm-dos-tem repository, navigate to the "Pull Requests" section, and find your request. Then look for the button to merge this request. This will create a new commit on the ua-snap/dvm-dos-tem/devel branch.

Final Thoughts

Congratulations. Now you are almost a software developer and are ready to start working with DVM-DOS-TEM. You should now understand:

  • The basics of setting up a computing environment for use with DVM-DOS-TEM.
  • The difference between Git and Github.
  • Why we are using Git and Github.
  • Why it is important to follow this procedure for getting DVM-DOS-TEM.
  • How to get a copy of DVM-DOS-TEM.
  • The basic workflow for making a modification to DVM-DOS-TEM and submitting it for inclusion "upstream".
  • How to keep your codebase up to date with the modifications that are happening in the "upstream" repository.