Brewery Evolution

These are the scripts and data that were used to generate the figures presented in:

Genomic stability and adaptation of beer brewing yeasts during serial repitching in the brewery
Christopher R. L. Large, Noah A Hanson, Andreas Tsouris, Omar Abou Saada, Jirasin Koonthongkaew, Yoichi Toyokawa, Tom Schmidlin, Daniela A Moreno-Habel, Hal McConnellogue, Richard Preiss, Hiroshi Takagi, Joseph Schacherer, Maitreya J Dunham

doi: https://doi.org/10.1101/2020.06.26.166157

Data

(A) Plate Reader

Raw data from the plate reader from the experiments documented in the manuscript are available. They include data from the wort conditions as well as unmentioned experiments that utilized starting ethanol ranging from 0 to 10% (v/v).

The files are:

2018_05_09_PDB_EtOh_Formatted.csv
2018_06_05_TestForEthanolInWort.csv
2018_06_08_GrowthCurve_YEPD_WORT.csv
2018_06_13_TestForEthanolInWort.csv

(B) Settling Assay

The raw images associated with the settling assay experiments are available. The two replicates are split into two directories:

20180621
20180524

(C) Sensory Analysis

The reformatted responses from the sensory panel done at HomebrewCon 2018 are available at:

Beer_ScoreSheet_Reformatted.txt

Scripts

Please note that in many instances, the scripts outlined here are written for my computing cluster and will require some retooling if they are to be adapted for other uses. Please bear with my occasional hard coding of directories. Hopefully, they can serve as inspiration for further studies.

(A) Alignment and Variant Calling

runSeqAlignVariantCall_20200504.sh

The reads mentioned in the paper were aligned to the SacCer3 genome from SGD. They were then preprocessed with GATK and picard tools
Variant calls for the ancestor were generated using Samtools and Freebayes.
With the ancestor variant calls in hand, the evolved samples were called with LoFreq in paired mode and the variant calls from Samtools and Freebayes were compared to the ancestor.
The variants were filtered with GATK and annotated using: yeast_annotation_chris_edits_20170925.py
The variants found in the clones were compared using MakeOverlapMatrix.R to generate an overlap matrix.

(B) Copy Number Analysis

runCNScripts_Populations_20200310.sh

The alignments from part (A) were measured for their genome wide depth of coverage using GATK/2. Please note that there is a new tool in GATK/4 that serves the same function.
1000-bp sliding windows of coverage were generated with the IGVtools command line implementation. Two iterations of this were done to generate a file with the filtering option of MAPQ > 30 and MAPQ > 0.
These files were then compared using: wigNormalizedToAverageReadDepth_MapQ_ForPlot.py
Plotting of individual samples was done with: PlotCopyNumber_OneSample_20200106.R
Furthering plotting, comparing multiple samples for the copy number figure were done with: Copy_Number_Plot_Text_20200619.R and Copy_Number_Population_AllChrom_20200609.R

(C) Allele Frequency

runVariantCall_GATK_Populations_20200310.sh

From the alignments generated in part (A), GATK variant calling using haplotype caller was used.
The variants were filtered to just include the highest confident SNPs using GATK. The SNPs were then processed into a table format.
Using a java script (VaraiantTableParse_GATK_BAF_20190206.jar) the table of SNPs were converted into a ratio of reference or alternate.
The output files are then plotted using a variety of scripts: AlleleFrequency_Population_AllChrom_20200429.R, Allele_Frequency_Clones_Plot_2020428.R, and Allele_Frequency_Plot_20200619.R
The selection coefficient was measured using the outputs of the above scripts and the following script: Estimating_s_20200629.R

(D) Phylogenomics

Using FastqToBam_20190802.sh and a wrapper script that pulled arguments from a sample sheet to run the many samples, the reads were aligned to SacCer3 using a methodology outlined by GATK best practices.
Next, the alignments were merged between different sequencing runs in the Mark Duplicates stage (picard tools), and variant called using GATK/4 Haplotype Caller in the GVCF mode in MarkDupliates_And_VariantCall_20200428.sh
The GVCFs from all of the samples were then combined using GenotypeGVCFs_GenomicsDB_20200106.sh which used GenomicsDBImport and GenotypeGVCFs from GATK/4 to jointly call variants. The variants were then filtered using GATK/4.
Then using MakeFastaForTree_20191221.sh, fasta files for each sample were created using BCFtools (two genomes concatenated with the first having the reference allele in the case of a heterozygous position and the alternate in the second).
A phylogenetic tree was then made with MakeTree_Small_20191218.sh

(E) Growth Rate

The growth rate of the beer strains was calculated using the growthrates library from growthrates . The script implementing that is here: GrowthCurve_GrowthRates_20200619.R

(F) Sensory Analysis

Responses from HomebrewCon 2018 were aggregated using excel, then reformatted into the above-linked data sheet (ParseBeerScoreSheet_20190129.py). Those data were then processed using the script: FlavorProfillingAnalysis_20190223.R

(G) Settling Assay

Pictures documented in the above data section were first analyzed in Fiji using the following semi-automated script: Settling_Assay.ijm
The output of these file were further processed b:y: AnalyzeSettlingData_20180612.py

(H) Assembly Polishing

Assemblies generated from ONT reads, demultiplexed with Guppy and assembled with SMARTdenovo were used as a reference for ONT read mapping with Minimap2 using runAssemblyPolish.sh
The assemblies were then polished with racon then medaka.
Afterwards, the Illumina reads were aligned to the polished assembly and pilon was run x3.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
Data		Data
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brewery Evolution

Table of Contents

Data

(A) Plate Reader

(B) Settling Assay

(C) Sensory Analysis

Scripts

(A) Alignment and Variant Calling

runSeqAlignVariantCall_20200504.sh

(B) Copy Number Analysis

runCNScripts_Populations_20200310.sh

(C) Allele Frequency

runVariantCall_GATK_Populations_20200310.sh

(D) Phylogenomics

(E) Growth Rate

(F) Sensory Analysis

(G) Settling Assay

(H) Assembly Polishing

About

Releases

Packages

Languages

dunhamlab/BreweryEvolution

Folders and files

Latest commit

History

Repository files navigation

Brewery Evolution

Table of Contents

Data

(A) Plate Reader

(B) Settling Assay

(C) Sensory Analysis

Scripts

(A) Alignment and Variant Calling

runSeqAlignVariantCall_20200504.sh

(B) Copy Number Analysis

runCNScripts_Populations_20200310.sh

(C) Allele Frequency

runVariantCall_GATK_Populations_20200310.sh

(D) Phylogenomics

(E) Growth Rate

(F) Sensory Analysis

(G) Settling Assay

(H) Assembly Polishing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages