Skip to content

pfenninglab/custom_ArchR_genomes_and_annotations

Repository files navigation

Custom genome and gene annotations for cross-species genomics

This repository provides a systematic framework for mapping gene annotations between species, with a focus on mapping human (GENCODE v47) and mouse (GENCODE vM25) annotations to various target species including primates and rodents.

Overview

Cross-species genomics requires high-quality genome assemblies and gene annotations. While genome assemblies are increasingly available through efforts like the Vertebrate Genome Project, gene annotations often lag behind. This repository provides:

  1. Automated downloading of source and target genome assemblies
  2. Systematic mapping of GENCODE annotations using Liftoff
  3. Creation of genome packages for downstream analysis

Source Data

Source Genomes and Annotations

Located in config/source_genomes.tsv:

Genome Version Source Annotation
Human GRCh38/hg38 UCSC GENCODE v47 comprehensive
Human GRCh38/hg38 UCSC GENCODE v47 basic
Mouse GRCm38/mm10 UCSC GENCODE vM25

Target Genomes

Located in config/target_genomes.tsv:

Output Structure

Processed data is organized under output/genomes/ with the following structure:

output/genomes/
├── {genome}/              # e.g., rheMac10/
│   ├── {genome}.fa       # Genome FASTA
│   └── annotations/      
│       └── {target_genome}-{source_genome}-{annotation_version}.gtf.gz

Available Lifted Annotations

The following table shows all currently available lifted gene annotations:

Target Genome Human (hg38) Mouse (mm10)
calJac4 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -
macFas6 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -
mCalJac1 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -
mMacNem1 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -
rheMac8 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -
rheMac10 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -
rn6 - gencode.vM25.basic, gencode.vM25.comp
rn7 - gencode.vM25.basic, gencode.vM25.comp
susScr11 gencode.v44.basic, gencode.v47.basic, gencode.v47.comp -

Each annotation is available as a gzipped GTF file in the respective genome's annotations directory. For example:

rheMac10/
├── rheMac10.fa
└── annotations/
    ├── rheMac10-hg38-gencode.v47.basic.gtf.gz
    ├── rheMac10-hg38-gencode.v47.comp.gtf.gz
    └── rheMac10-mm10-gencode.v44.basic.gtf.gz

Usage

Requirements

  • Install this respository and dependencies using provided conda environment:
git clone git@github.com:pfenninglab/custom_ArchR_genomes_and_annotations.git
cd custom_ArchR_genomes_and_annotations

conda env create -f config/conda_environment.yml

Basic Workflow

  1. Download source/target genomes:
./scripts/download-genome.sh -g rheMac10 \
  -f https://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/bigZips/rheMac10.fa.gz \
  -n gencode.v47.basic \
  -t https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.basic.annotation.gtf.gz
  1. Run Liftoff gene mapping:
./scripts/liftoff-genes.sh -s hg38 -t rheMac10 -a gencode.v47.basic

Citations

If you use these resources, please cite:

Phan, BaDoi; Pfenning, Andreas (2022): Alternate gene annotations for rat, macaque, and marmoset for single cell RNA and ATAC analyses.
Carnegie Mellon University. Dataset. https://doi.org/10.1184/R1/21176401.v1

Contributing

Issues and pull requests welcome! See CONTRIBUTING.md for guidelines.

About

R objects that contain custom ArchR genome annotations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published