pangeblocks-experiments

Experiments to evaluate graph constructions made with pangeblocks and other tools.

HLA-zoo Data was downloaded from here https://github.com/ekg/HLA-zoo. It contains 28 set of sequences. Some graphs generated with pggb (path to the graphs) contain paths using the reverse and complement of the node labels (B-3106,C-3107,DMA-3108,DOA-3111,DOB-3112,DOB-3112,DPA1-3113,DQB1-3119,DRA-3122,DRB1-3123,TAP1-6890, and TAP2-6891), which lead to the conclusion that those sequences were not in the same direction as the other strands. We identified these sequences in each file here, and then removed them before running all the tools to create the graphs (including pggb, which was run with the same parameters reported in the original repository).

The input required by pggb is the set of sequences, no MSA is required, as opposed to PanPA, makeprg and pangeblocks

MSAs

To create MSAs for a set of sequences edit params-mafft.yaml, OP and EP are the gap open penalty and offset (works like gap extension penalty), respectively (check mafft --help for details)

snakemake rules/msa.smk -c16 --use-conda

when running vg for creating graphs, the header of the MSAs cause some problems when whitespaces are at the beginning ls /data/msas-pangeblocks/msas-didelot/*/*.fa | while read f; do sed 's, ,,g' -i $f; done

Graphs

`PanPA`

set PATH_MSAS and PANPA_OUTPUT in params.yaml, and then run

snakemake -s rules/panpa.smk -c16 -k

`makeprg`

set PATH_MSAS and MAKEPRG_OUTPUT in params.yaml, and then run

snakemake -s rules/makeprg.smk -c16 -k

`pggb`

set PGGB_INPUT (path where sequences are saved in .fa format) and PGGB_OUTPUT in params.yaml, and then run

snakemake -s rules/pggb.smk -c16 --use-conda -k

`pangeblocks`

clone repo https://github.com/AlgoLab/pangeblocks and follow instruction in the README file.

`vg`

set PATH_MSAS and VG_OUTPUT in params.yaml, and then run

snakemake -s rules/vg.smk -c16 --use-conda

Stats

Didelot graphs

echo -e "path_gfa\tn_nodes\tn_edges\tlen_graph\tis_acyclic\tweakly_connected_components\tlen_shared_nodes\tperc_len_shared_nodes" > notebooks/didelot-stats.gfa.csv
ls didelot-pggb/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/didelot-stats.gfa.csv
ls didelot-PanPA/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/didelot-stats.gfa.csv
ls didelot-makeprg/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/didelot-stats.gfa.csv
ls didelot-pangeblocks/*/gfa-unchop/*/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/didelot-stats.gfa.csv
ls didelot-vg/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/didelot-stats.gfa.csv

HLA-zoo graphs

echo -e "path_gfa\tn_nodes\tn_edges\tlen_graph\tis_acyclic\tweakly_connected_components\tlen_shared_nodes\tperc_len_shared_nodes" > notebooks/HLA-zoo-stats.gfa.csv
ls HLA-zoo-pggb/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/HLA-zoo-stats.gfa.csv
ls HLA-zoo-PanPA/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/HLA-zoo-stats.gfa.csv
ls HLA-zoo-makeprg/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/HLA-zoo-stats.gfa.csv
ls HLA-zoo-pangeblocks/*/gfa-unchop/*/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/HLA-zoo-stats.gfa.csv
ls HLA-zoo-vg/*/*.gfa | while read f; do ./scripts/stats_gfa_vg.sh $f ; done >> notebooks/HLA-zoo-stats.gfa.csv

Experiment: read mapping

Read Simulation

set PATH_SEQS and PATH_OUTPUT in params-simulate-reads.yaml. For each fasta in PATH_SEQS a reference (by default the first sequence in the file) will be extracted and reads will be simulated with DWGSIM using this sequence

snakemake -s rules/simulate_reads.smk -c16

Read alignment

params-alignment.yaml

GraphAligner

snakemake -s rules/alignment_simreads_graphaligner.smk -c16

Giraffe

snakemake -s rules/alignment_simreads_giraffe.smk -c16 --use-conda -k

VCF from GFA with `vg`

ls HLA-zoo-pangeblocks/*/gfa-unchop/*/*/*.gfa | while read f; do ./scripts/vcf_from_gfa.sh $f; done
ls HLA-zoo-pangeblocks-norevcomplement/*/gfa-unchop/*/*/*.gfa | while read f; do ./scripts/vcf_from_gfa.sh $f; done
<!-- ls HLA-zoo-PanPA-norevcomplement/*/*.gfa | while read f; do ./scripts/vcf_from_gfa.sh $f; done  -->
<!-- ls HLA-zoo-PanPA/*/*.gfa | while read f; do ./scripts/vcf_from_gfa.sh $f; done -->
ls HLA-zoo-pggb/*.gfa | while read f; do ./scripts/vcf_from_gfa.sh $f; done

TODO

create env for PanPA
fix rule pggb.smk: match gfa name

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.dvc		.dvc
envs		envs
notebooks		notebooks
rules		rules
scripts		scripts
utils		utils
.dvcignore		.dvcignore
.gitignore		.gitignore
HLA-zoo-PanPA.dvc		HLA-zoo-PanPA.dvc
HLA-zoo-makeprg.dvc		HLA-zoo-makeprg.dvc
HLA-zoo-pangeblocks.dvc		HLA-zoo-pangeblocks.dvc
HLA-zoo-pggb.dvc		HLA-zoo-pggb.dvc
HLA-zoo-vg.dvc		HLA-zoo-vg.dvc
HLA-zoo.to_remove.txt		HLA-zoo.to_remove.txt
LICENSE		LICENSE
README.md		README.md
alignment-simulated-reads-HLA-zoo.dvc		alignment-simulated-reads-HLA-zoo.dvc
alignment-simulated-reads-didelot.dvc		alignment-simulated-reads-didelot.dvc
didelot-PanPA.dvc		didelot-PanPA.dvc
didelot-makeprg.dvc		didelot-makeprg.dvc
didelot-pangeblocks.dvc		didelot-pangeblocks.dvc
didelot-pggb.dvc		didelot-pggb.dvc
didelot-vg.dvc		didelot-vg.dvc
params-alignment.yaml		params-alignment.yaml
params-mafft.yaml		params-mafft.yaml
params-simulate-reads.yaml		params-simulate-reads.yaml
params.yaml		params.yaml
simulated-reads-HLA-zoo.dvc		simulated-reads-HLA-zoo.dvc
simulated-reads-didelot.dvc		simulated-reads-didelot.dvc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pangeblocks-experiments

MSAs

Graphs

`PanPA`

`makeprg`

`pggb`

`pangeblocks`

`vg`

Stats

Didelot graphs

HLA-zoo graphs

Experiment: read mapping

Read Simulation

Read alignment

VCF from GFA with `vg`

About

Releases

Packages

Languages

License

AlgoLab/pangeblocks-experiments

Folders and files

Latest commit

History

Repository files navigation

pangeblocks-experiments

MSAs

Graphs

PanPA

makeprg

pggb

pangeblocks

vg

Stats

Didelot graphs

HLA-zoo graphs

Experiment: read mapping

Read Simulation

Read alignment

VCF from GFA with vg

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`PanPA`

`makeprg`

`pggb`

`pangeblocks`

`vg`

VCF from GFA with `vg`

Packages