Skip to content

A Practical Haplotype Graph (PHG) for the honey bee (Apis mellifera)

Notifications You must be signed in to change notification settings

matthewwiese/bee-phg

Repository files navigation

Bee PHG

cool bee gif Testing an idea related to VSH - work in progress

Currently focused on using all available genomes on NCBI to build the PHG and BioProject PRJNA605407 short reads for imputation. Although perhaps BioProject PRJNA311274 is more useful for a first pass given that we can compare to their VCF? Apparently using drones instead of workers allows for sequencing at lower depth and with greater variant detection, plus reducing phasing complication (Wragg et al. 2022).

In 2020, Jensen et al. wrote on the topic of the PHG:

Skim sequence data are aligned to consensus haplotypes to find the best path through the graph, and single nucleotide polymorphism (SNP) variants from the predicted haplotype path can be written to a variant call format (VCF) file. The result is a set of genome-wide SNP variant calls for each taxon, imputed from skim sequence.

So maybe I should start with the Wragg et al. data as they provide a VCF with many SNPs that could be compared to the SNPs imputed from PHG pathfinding? I would also like to utilize the Saelao et al. data as I think it would be more relevant to me as a beekeeper in the U.S. and provide insight into which stocks I should pursue for my apiary.

  • Genome-wide patterns of differentiation within and among U.S. commercial honey bee stocks

    • This is the publication associated with the reads I'm using

    • Page 6 - 7:

      CSS scores improve the power to detect and resolve selection signals and localize candidate regions involved in traits experiencing selection pressure. Regions under selection shared by Minnesota Hygienic, Pol-line, Hilo, and Russian stocks provide actionable targets for future research and breeding. Our annotation using haplotype blocks identified 46 of the 58,333 [23] that were shared among the four stocks with a strong signal of selection providing evidence of a common selection signal among stocks associated with mite- and disease-resistance traits. However, there remains an unlikely possibility that a CSS signal may arise of a specific region that is highly selected in only the Italian stocks and not of the other populations. Though we feel that it is more likely that the research stocks are arriving at a shared resistance given the intent of their respective programs.

    • Table 4 spread between page 8 and 9 of great interest

  • Sequence-based genome-wide association studies reveal the polygenic architecture of Varroa destructor resistance in Western honey bees Apis mellifera

    • Varroa resistance is polygenic, with 60 genetic markers having significant impact
    • Their code: https://github.com/seynard/gwas_beestrong
    • Page 7, subheading "Associated variants"; page 10, subheading "Example of associations"
    • Page 18:

      Varroa resistance mechanisms can be partitioned into two types of traits: first, traits related to hygiene (including VSH, recapping and MNR, but also more broadly grooming behaviour) that involve the accurate detection by workers of varroa infested cells and second, their subsequent inspection/destruction. It has been shown that VSH bees target more specifically cells with highly compromised brood, which is related to the level of infestation in the cells [47, 14]. As a result, cells with fewer mites or mites that are not effectively reproducing are more likely to stay intact, thus increasing the level of mite non reproduction in the colony (MNR). The second type of trait is a trait expressed by either the workers or the brood, that would disrupt mite reproduction within capped cells (and thus increase MNR). Both trait types can reduce mite infestation in the colony, thus increasing varroa resistance of honey bee colonies. Interestingly, in this study we found markers associated with genes that relate to these two categories.

  • Complex population structure and haplotype patterns in the Western European honey bee from sequencing a large panel of haploid drones

    • From the abstract (emphasis added):

      This large naturally phased data set is available as a single vcf file that can now serve as a reference for subsequent populations genomics studies in the honey bee, such as (i) selecting individuals of verified homogeneous genetic backgrounds as references, (ii) imputing genotypes from a lower-density data set generated by an SNP-chip or by low-pass sequencing, or (iii) selecting SNPs compatible with the requirements of genotyping chips.

    • Their code: https://github.com/avignal5/SeqApiPop/tree/v1.5

    • VCF with filtered 7 million SNPs and 870 haploid drone samples: https://doi.org/10.5281/zenodo.5592452

    • SRA: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA311274

  • AmelHap: Leveraging drone whole-genome sequence data to create a honey bee HapMap

    We have demonstrated that by using AmelHap to impute high levels of missing data (61%), very high genotype concordance ( > 95%) can be achieved in drones. We also demonstrate the resource to be effective at imputing moderate levels of missing data (12%) in an independent diploid dataset. We have not extensively investigated the parameter space for imputation, or the full range of tools available, and so further improvements on imputation performance are likely achievable.

  • Genetic markers for the resistance of honey bee to Varroa destructor

    • Table top of page 5 of interest

Running Conda within tmux cannot find Conda

I have a weird PATH problem with Conda and tmux that necessitates me to deactivate after entering a tmux session. When I enter a tmux session and activate an environment, the PATH inexplicably loses conda:

(base) matt@machine:~$ echo $PATH
/home/matt/miniconda3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/matt/phg/bin
(base) matt@machine:~$ conda activate phgv2-conda
(phgv2-conda) matt@machine:~$ echo $PATH
/home/matt/miniconda3/envs/phgv2-conda/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/matt/phg/bin

Remembering to deactivate out of base fixes it:

(base) matt@machine:~$ echo $PATH
/home/matt/miniconda3/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/matt/phg/bin
(base) matt@machine:~$ conda deactivate
matt@machine:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/matt/phg/bin
matt@machine:~$ conda activate phgv2-conda
(phgv2-conda) matt@machine:~$ echo $PATH
/home/matt/miniconda3/envs/phgv2-conda/bin:/home/matt/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/matt/phg/bin

Totally unrelated to the PHG but there is a nonzero chance it confuses somebody else too, especially when you want to use tmux for long-running commands like AnchorWave. I have found multiple GitHub issues with people who have the same problem over the years... ¯\(ツ)