This repository includes scripts to obtain files to have fun with the vg tutorial (https://github.com/Pfern/PANGenomics) day 3. Once you have all the files necessary, you can play around bacterial pan-genome.
- SRA Toolkit (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/)
- curl command (if you use
wget
instead, modifyscript/fetch_data.sh
) - Jupyter Notebook
- Internet connection since the script downloads several files
- Go to
scripts/
and run./fetch_data.sh
to obtain E. coli complete genomes, gene tables, and one fastq file as well as minia. - Then run
run_minia.sh
to generate a contig fasta file from the fastq reads. - Use
extract_gene.ipynb
with Jupyter Notebook as necessary. It helps you extract genome regions corresponding to a certain gene from multiple genomes. - Now you're free to do whatever you want. Check the original tutorial and have fun.