GeneAlign

bash scripts for setting up a gene alignment pipeline

Prologue

Invoke nproc to determine the number of processors you can devote to these operations. They will ask for "threads" when running and you can use up to your nproc.

Installing

git clone this repository and place setup_salmon.sh and run_salmon.sh into desired working directory
In terminal, run bash -i setup_salmon.sh to begin the install process.

If installing conda through setup_salmon.sh you may need to invoke bash wihtout -i
Additionally, conda may fail to initialize properly. In this case, manually install conda on your system prior to continuing.

With conda installed fully, invoke bash -i setup_salmon.sh again to setup the salmon working directories.

This will include making a decoy-aware salmon index which can be very RAM intensive and may fail unless enough RAM is available.

When finished, you should find ./salmon_processing working dir populated by /sra_out and salmon_ref.

salmon_ref should be fully populated with decoys.txt, human_genome.fa.gz, human_transcriptome.fa.gz, human_gentrome.fa.gz, mappings.gtf, and \salmon_index.
\salmon_index should be populated by info.json and versionInfo.json (and many other files) if building the index worked.

Running

Source an accession list of samples you wish to analyze and take note of whether the data is paired or single ended.
Download or generate a list of the accession numbers for the samples, line-by-line, in an accession_list.txt file.
Place your accession_list.txt in the /sra_out folder.
Invoke bash -i run_salmon.sh and follow the prompts.
To be continued...

Resources

Miniconda

Anaconda Software Distribution Computer software. Vers. 24.7.1. Anaconda, Sept. 2024. Web. https://anaconda.com

SRA-Toolkit

sra-tools Vers. 3.1.0. NCBI, Sept. 2024. Web. https://github.com/ncbi/sra-tools

Salmon

Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. https://github.com/COMBINE-lab/salmon

Ensembl database (human genome, human transcriptome, human gene mappings)

Peter W Harrison, M Ridwan Amode, Olanrewaju Austine-Orimoloye, Andrey G Azov, Matthieu Barba, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, Simarpreet Kaur Bhurji, Sanjay Boddu, Paulo R Branco Lins, Lucy Brooks, Shashank Budhanuru Ramaraju, Lahcen I Campbell, Manuel Carbajo Martinez, Mehrnaz Charkhchi, Kapeel Chougule, Alexander Cockburn, Claire Davidson, Nishadi H De Silva, Kamalkumar Dodiya, Sarah Donaldson, Bilal El Houdaigui, Tamara El Naboulsi, Reham Fatima, Carlos Garcia Giron, Thiago Genez, Dionysios Grigoriadis, Gurpreet S Ghattaoraya, Jose Gonzalez Martinez, Tatiana A Gurbich, Matthew Hardy, Zoe Hollis, Thibaut Hourlier, Toby Hunt, Mike Kay, Vinay Kaykala, Tuan Le, Diana Lemos, Disha Lodha, Diego Marques-Coelho, Gareth Maslen, Gabriela Alejandra Merino, Louisse Paola Mirabueno, Aleena Mushtaq, Syed Nakib Hossain, Denye N Ogeh, Manoj Pandian Sakthivel, Anne Parker, Malcolm Perry, Ivana Piližota, Daniel Poppleton, Irina Prosovetskaia, Shriya Raj, José G Pérez-Silva, Ahamed Imran Abdul Salam, Shradha Saraf, Nuno Saraiva-Agostinho, Dan Sheppard, Swati Sinha, Botond Sipos, Vasily Sitnik, William Stark, Emily Steed, Marie-Marthe Suner, Likhitha Surapaneni, Kyösti Sutinen, Francesca Floriana Tricomi, David Urbina-Gómez, Andres Veidenberg, Thomas A Walsh, Doreen Ware, Elizabeth Wass, Natalie L Willhoft, Jamie Allen, Jorge Alvarez-Jarreta, Marc Chakiachvili, Bethany Flint, Stefano Giorgetti, Leanne Haggerty, Garth R Ilsley, Jon Keatley, Jane E Loveland, Benjamin Moore, Jonathan M Mudge, Guy Naamati, John Tate, Stephen J Trevanion, Andrea Winterbottom, Adam Frankish, Sarah E Hunt, Fiona Cunningham, Sarah Dyer, Robert D Finn, Fergal J Martin, and Andrew D Yates. Ensembl 2024. Nucleic Acids Res. 2024, 52(D1):D891–D899. PMID: 37953337. 10.1093/nar/gkad1049.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scripts_featureCounts		scripts_featureCounts
scripts_salmon		scripts_salmon
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeneAlign

Prologue

Installing

Running

Resources

About

Releases

Packages

Languages

MrBones1102/GeneAlign

Folders and files

Latest commit

History

Repository files navigation

GeneAlign

Prologue

Installing

Running

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages