Skip to content

McMinds-Lab/analysis_templates

Repository files navigation

Analysis templates

Standard lab protocols, recommendations, and a collection of scripts for basic data processing, analysis and visualization, to be modified as necessary for different projects

Organization

File structure (Mise en place)

This is something I'm actively optimizing and learning about, so my strategy might not exactly be a gold strategy, but it is extremely important to at least be thinking about!

Primary principles:

  • Keep a 'raw data' directory, where files are considered completely immutable for a given project or analysis. I waver between having a system-wide raw data directory that contains multiple project sub-directories, or a raw data directory within each project directory. But the point is that there should be a place where files belong that can be considered 'original' and that have READMEs associated with them to explain where these data came from. This directory could ideally be re-generated by unzipping a version archived somewhere such as Zenodo.
    • 'intermediate data', such as processed versions of your data like filtered sequencing reads or 'cleaned' metadata files, should be generated from the 'raw' files using scripts, which are contained in a separate:
  • 'Scripts' directory. This directory is ideally a project-specific Git repository that can track changes and be collaborated on through GitHub, where an archived version can be created upon manuscript submission. These scripts should ideally be cross-platform compatible, use relative filepaths, and take command line arguments, such that someone can simply download the raw data, clone this repository, specify local settings and the location of an output directory, and completely re-generate the results of your study by running your scripts as specified in a README.
    • Try to organize things around groups of 7 or less - e.g. if you have 20 individual scripts, merge some of them, tie them together with 7 wrappers, or organize them in 7 subfolders
    • Order scripts by starting their filenames with a 2-digit number (e.g. 01_sequencing_qc.sh)
    • Have each script place all its outputs (including log files) in a folder with the same name as the script, within the overall output directory specified by the user

Resources

Here's a list of resources my students and I have found useful for learning the science and code:

GitHub itself:

Not just for programmers: How GitHub can accelerate collaborative and reproducible research in ecology and evolution
An entire paper written and edited directly on GitHub

Stats

Experimental design in ecology

General principles and great examples

Why are you doing stats

All basic stats use linear models

Linear models / GLMMs

Permutation testing, bootstrapping, resampling

Good analysis of how stats are related to causal inference

Mixed effects models and DAGs

Inferring Multiple Causality: The Limitations of Path Analysis

Coding

Shell and general HPC use

My own explainer for setting up a computer

HPC Best Practices

In R

https://nyu-cdsc.github.io/learningr/assets/simulation.pdf
https://intro2r.com/
https://www.codecademy.com/learn/learn-r

In Stan

https://betanalpha.github.io/

In Markdown

https://www.markdownguide.org/basic-syntax/

Add fancy equations to markdown

About

Basic data analysis scripts to be modified for each project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published