Skip to content

Simulators of spatial genetic diversity: for non-programmers and researchers!

License

Notifications You must be signed in to change notification settings

Quetzal-framework/quetzal-EGGS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quetzal-EGGS

Becheler License: GPL v3 Website becheler.github.io

Quetzal-EGGS is a suit of geospatial programs for simulating neutral molecular diversity of populations. In their present version, the programs intend to:

  1. read generic parameters in a configuration file and localized parameters in a georeferenced map
  2. simulate a demography history (forward in time)
  3. simulate coalescence trees for independent genetic markers (backward in time)
  4. record outputs in a SQL database so you can access them later for inference using Quetzal-CRUMBS

It compares to other available simulation resources ( like SPLATCHE, simcoal2, egglib, msprime or necsim) by offering original demographic (forward in time) models.

Being built from Quetzal-CoalTL, this project true strength lies in the ease with which new demographic assumptions (continental dispersal kernels, oceanic dispersal barriers, niche, density-dependence ...) can be integrated in a new simulator.

  • 🔮 looking forward, we expect to grow the list of existing programs with customized demographic models.
  • 📧 You are interested? Think of a new model? Want to give some feedback or contribute? Don't be shy, contact me!
  • ⭐ You think this is a cool project? Drop a star on GitHub ☝️
  • 🐛 A bug? Oopsie daisy! I'll fix it asap if you email me or open an issue ☝️
Program Demographic process
EGG1 Stochastic oceanic dispersal (rafting events)
EGG2 Pulses in continental matrix connectivity

Description

A demographic history is simulated forward in time in a heterogeneous landscape (the landscape patterns are given to the program options as a geographic raster layer).

The exact demogaphic model is different for each quetzal-EGG, and their parametrization are different.

At sampling time, the demographic process is stopped and tip nodes coordinates (that is, the gene copies that have been sampled across the landscape) are read by the program from a CSV file.

A backward in time process begins, that tracks genes ancestors back in the demographic history. When the Most Recent Common Ancestor of the sample is found, the simulation stops and records simulation parameters and simulated gene trees in a SQLite database.

Installation

Using Docker

If you want to try quetzal-EGGS, or just do some tests on local before to go wild on a cluster, Docker is for you.

Docker is a software that can package an application and its dependencies in a virtual container that can run on any Linux, Windows, or macOS computer. It is more efficient than virtual machines, but less than system-wide installation.

  • Install Docker
  • Then, download our Docker image by typing docker pull arnaudbecheler/quetzal-eggs in a terminal
  • Enter a docker interactive session with docker run --name mycontainer -it arnaudbecheler/quetzal-eggs bash
  • Build and install the project to /home/EGGS directory with:
git clone --recurse-submodules https://github.com/Becheler/quetzal-EGGS
cd quetzal-EGGS
mkdir Release
cd Release
cmake .. -DCMAKE_INSTALL_PREFIX="/home/EGGS"
cmake --build . --config Release --target install

You can now leave the build directory to see what is available:

cd /home/EGGS/
ls

You should see a binary and some default configuration files:

EGG1  quetzal_EGG1.config  sample.csv  suitability.tif
  • EGG1 is the program binary implementing model 1
  • quetzal_EGG1.config is the configuration file associated to the model 1
  • sample.csv lists the coordinates of sampled nodes (tips)
  • suitability.tif is the default landscape You can test the binary with:
./EGG1 --version

and access its help menu with:

./EGG1 --help   

You can run a simulation with the default settings:

./EGG1 --config quetzal_EGG1.config --suitability suitability.tif --tips sample.csv

In another terminal, go to the destination folder of your choice and copy the simulation result with the following command:

docker cp mycontainer:/home/EGGS/test_pods.db .

You can finally upload the .db here to visualize the results. You can exit mycontainer by typing exit.

From source with the Great Lakes Slurm Cluster

If you are a user of the Great Lakes Slurm cluster from University of Michigan, you can load all the required dependencies by typing:

module load cmake/3.17.3 gcc/8.2.0 gdal/3.0.1 boost/1.75.0

Then build the project with:

git clone --recurse-submodules https://github.com/Becheler/quetzal-EGGS
cd quetzal-EGGS
mkdir Release
cd Release
cmake .. -DCMAKE_CXX_COMPILER=$(which g++) -DCMAKE_C_COMPILER=$(which gcc)
cmake --build . --config Release

Dependencies

OS

The present project was tested with the following OS:

  • Distributor ID: Ubuntu
  • Description: Ubuntu 20.04.2 LTS
  • Release: 20.04
  • Codename: focal

Git

The Git utility package is, by default, included in Ubuntu’s software repositories that can be installed via APT. Enter sudo apt install git in a terminal to download and install Git.

Check it has been properly installed with git --version.

CMake

CMake is used as an open-source, cross-platform toolbox we use to build, test, and package the software. Check if it is installed by typing cmake --version in a terminal. If it is not installed, please visit the installation pages.

C++ compiler

gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

Boost

Install boost with: sudo apt-get install libboost-all-dev

GDAL

The Geospatial Data Abstraction Library (GDAL) is essential to represent a spatially explicit landscape! To install GDAL please visit: http://www.gdal.org/

Build

When all dependencies are met, you can simply build the project by running the following commands in a terminal:

git clone --recurse-submodules https://github.com/Becheler/quetzal-EGGS
cd quetzal-EGGS
mkdir Release
cd Release
cmake ..
cmake --build . --config Release

Tests output

If tests are failing and you want to output the reason of the failure, use instead:

CTEST_OUTPUT_ON_FAILURE=TRUE cmake --build .

Memory

git clone --recurse-submodules https://github.com/Becheler/quetzal-EGGS
cd quetzal-EGGS
mkdir RelDeb
cd RelDeb
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo
make
valgrind --tool=massif --xtree-memory=full ./src/EGG1 --config ../test/data/quetzal_EGG1.config --tips ../test/data/sample.csv --suitability ../test/data/suitability.tif --duration 50