Authors
Supervisors
- Introduction
- Pipeline
- Results
- Summary
Overall we had:
- 39 HIV+ samples (IonTorrent)
- 754 HIV- samples (IonTorrent)
- 54 HIV- samples (BGI)
Cell free DNA is quite an exotic data to analyze, especially in terms of microbiology, that is why all tresholds are not so strict.
First two steps of the study: "Unmapped reads extraction" & "Assigning taxonomic labels" were made on the server.
All further steps that included data analysis were performed locally.
To perform every step HIV_shadow
conda envinroment was used
Figure 1. The whole pipeline overview.
IonTorrent samples were already mapped to the human genome and files were presented in .bam
format. Unmapped reads were extracted using samtools v.1.20.
1
See Snakefiles/Snakefile_IonTorrent
file for details.
BGI samples were presented in raw .fastq.gz
format. They were mapped to the human genome (hg19, NCBI build 372) using bowtie2 v.2.5.3.
3 Then unmapped reads were also extracted usint samtools v.1.20.
1
See Snakefiles/Snakefile_BGI
file for details.
Taxonomic identification was performed with kraken2 v.2.1.3.
4 utilizing full PlusPF (77GB) database5 with 0.6 confidence threshold.
Clipped image from Snakefiles with kraken2 parameters:
rule kraken:
input:
fastq="fastq_BGI/{sample}_unmapped.fastq",
db="/path/to/kraken2_db" #enter path to db
output:
report = "kraken_report_BGI/{sample}_kraken_report.txt",
out = "kraken_output_BGI/{sample}_kraken_output.txt"
shell:
"""
kraken2 --db {input.db} --output {output.out} \
--report {output.report} --confidence 0.60 {input.fastq}
"""
All samples (both IonTorrent and BGI) names were organised with this pattern: "YYYYMMDD_ID" and organized to different directories (e.g. HIV
& CTRL
).
metadata.csv
was generated using scripts/create_metadata.py
script.
Clipped image from laboratory journal:
# Usage
# {path_to_script} {path_to_HIV_samples} {path_to_ctrl_samples} {output_file_name}
%run scripts/create_metadata.py HIV/ CTRL/ metadata.csv
6 counts.csv
files (from species to phylum level) were parsed from kraken2 reports using KrakenTools v.1.2.
6
Possible contamination filtering was performed on this step.
Self-written scripts utilizied:
Script | Purpose |
---|---|
run_kreport2mpa.sh |
to use KrakenTools for ~800 files at once |
find_line.py |
to find contaminants precisely |
delete_lines.py |
to delete them |
processing_script.py |
to return sample_ids to files |
convert2csv.py |
to convert .txt files to .csv files |
filter_possible_contaminants.py |
to filter contaminants based on the data criteria |
Table 1. Scripts used to parse counts.csv
files.
Contamination filtering criterias
The criteria about identifying and removing potential contamination in our data is based on the collection dates of the samples.
When analyzing cell-free DNA from various samples, ideally, the organisms (taxa) detected should be distributed somewhat randomly across different samples, depending on their source, environment, etc. If certain organisms appear only in samples that were collected on the same date, this pattern might suggest that those organisms weren't actually present in the samples originally but were introduced accidentally on that particular day—possibly during sample collection, processing, or handling.
Key Points:
- Same Date, Same Taxon: If we find that a specific organism (taxon) appears exclusively in samples that were all collected on the same date, and this organism does not appear in samples from other dates, it might indicate contamination.
- Cross-Verification: Check if this organism appears in other samples that are not from that specific date. If it doesn’t, this supports the contamination theory.
- Removal of Suspected Data: To ensure the integrity of data analysis, these suspected contaminated data points should be removed before performing further analysis.
Due to limitation this filtration will be performed only on species level. Because we can filter out Klebsiella variicola that was found only on 2022/03/03, but we cannot remove the whole Klebsiella genus.
In addition, the following taxa were weeded out of the data:
- Cutibacterium acne
- All bacteriophages
To find the association between clinical metadata and microbial meta-omics features MaAsLin2 v.1.7.3.
7 was used.
See scripts/MaAsLin2.R
script for details.
MaAsLin2 launch parameters:
fit_data = Maaslin2(input_data = counts,
input_metadata = metadata,
min_prevalence = 0.01,
normalization = "TSS",
output = "MaAsLin2_results",
analysis_method = "LM",
max_significance = 0.05,
correction = "BH",
plot_heatmap = TRUE,
plot_scatter = TRUE,
fixed_effects = c("HIV_status"))
MaAsLin2 results were visualized as volcano plot with Volcano_plot/volcano.R
script.
Reasons for volcano plot instead of heatmap:
- Volcano plot allowed 2 metrics to be plotted at once:
log2fc
&p-value
. - We only have 2 groups: HIV+ and HIV-. Heatmap is useful when more groups are displayed. Volcano plot is perfect for 2 groups.
- Volcano plot is the classic way of displaying differential relative data.
- Aesthetic principles: MaAsLin2 found ~40 statistically significant taxa, the heatmap would be too high/wide (depending on configuration).
Mean relative abundance barplots were visualised to determine the relative percentage of a particular taxon in samples from the HIV+ and HIV- groups
Visualization was made with scripts/Bar_plot.R
script.
Clipped image from laboratory journal:
# Usage
# {path_to_script} {path_to_metadata} {path_to_counts_species} {path_to_counts_genus} {path_to_counts_family} {path_to_counts_order} {path_to_counts_class} {path_to_counts_phylum}
! Rscript scripts/Bar_plot.R metadata.csv counts/counts_species_filtered.csv counts/counts_genus.csv counts/counts_family.csv counts/counts_class.csv counts/counts_order.csv counts/counts_phylum.csv
α-diversity
To measure mean species diversity in HIV+ and HIV- groups 3 α-diversity indices were estimated:
To compare the values of each index between HIV+ and HIV- groups Mann-Whitney U Test11 was used.
See scripts/Alpha_div_calculations.R
& scripts/Alpha.R
scripts for details.
β-diversity
To measure the extent of differentiation (distribution) of species according to HIV status β-diversity in 2 metrics:
To compare the values of each metric between HIV+ and HIV- groups PERMANOVA14 was used.
See Beta_div/beta_diversity.R
script for details.
Rarefaction criterias:
Bray-Curtis dissimilarity
bray <- avgdist(taxon_counts, dmethod="bray", sample=10)%>%
as.matrix()%>%
as_tibble(rownames = "sample_id")
Jaccard similarity
jaccard <- avgdist(taxon_counts, dmethod="jaccard", sample=10)%>%
as.matrix()%>%
as_tibble(rownames = "sample_id")
The script scripts/core_microbiota_HIV.py
was used to draw the core microbiota graphs.
Figure 2. Main results overview.
Counts distribution graphs were made with scripts/describe.py
script
Clipped image from laboratory journal:
# Usage
# {path_to_script} {path_to_input_file} {taxonomic_level}
%run scripts/describe.py "counts/counts_species_filtered.csv" Species
Species | Genus | Family | Order | Class | Family |
---|---|---|---|---|---|
Table 2. Counts distribution on every taxonomic level.
It is clearly can be seen that the the distribution graph is shifted to the right in all cases.
Figure 3. Volcano plot with differential bacterial abundance.
Figure 4. Mean Relative Abundance from species to phylum level.
Index | M-W p-value |
---|---|
Shannon | <0.001 |
Chao1 | <0.001 |
Pielou | <0.001 |
Table 3. α-diversity metrics.
Figure 5. α-diversity visualization.
Index | PERMANOVA p-value |
---|---|
Bray-Curtis dissimilarity | <0.001 |
Jaccard similarity | <0.001 |
Table 4. β-diversity comparison between HIV+ and HIV- groups.
Figure 6. β-diversity visualization. A - Bray-Curtis dissimilarity. B - Jaccard similarity.
HIV+ | HIV- |
---|---|
Table 4. Core microbiota for HIV+ and HIV- groups.
Taxon | Real world data | Reference |
---|---|---|
Bradyrhizobium sp. BTAi1 | HIV infection and subsequent antiretroviral therapy can lead to an enrichment of Bradyrhizobium in the oral microbiome | 15, 16, 17 |
Ralstonia insidiosa | HIV infection is associated with overgrowth of opportunistic pathogens including Ralstonia in the gut | 16, 17, 18 |
Stenotrophomonas maltophilia | HIV infection is associated with the occurrence of opportunistic infections including Stenotrophomonas maltophilia | 19, 20 |
Herbaspirillum huttiense | HIV-related immunosuppression can lead to opportunistic infections, including infections by Herbaspirillum | 21, 22 |
Ralstonia pickettii | HIV-related immunosuppression can lead to infections by unusual pathogens like Ralstonia pickettii | 16, 17, 18, 23 |
Microbacterium sp. Y-01 | HIV can compromise the immune system, increasing susceptibility to infections by less common bacteria, including Microbacterium | 23 |
Table 5. The Shadow of HIV itself.
Footnotes
-
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). ↩ ↩2
-
Homo sapiens genome assembly GRCh37. NCBI https://www.ncbi.nlm.nih.gov/data-hub/assembly/GCF_000001405.13/. ↩
-
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). ↩
-
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019). ↩
-
PlusPF. https://genome-idx.s3.amazonaws.com/kraken/pluspf_20240112/inspect.txt. ↩
-
Lu, J. et al. Metagenome analysis using the Kraken software suite. Nat. Protoc. 17, 2815–2839 (2022). ↩
-
Mallick, H. et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Comput. Biol. 17, e1009442 (2021). ↩
-
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948). ↩
-
Chao, A. & Bunge, J. Estimating the Number of Species in a Stochastic Abundance Model. Biometrics 58, 531–539 (2002). ↩
-
Pielou, E. C. The measurement of diversity in different types of biological collections. J. Theor. Biol. 13, 131–144 (1966). ↩
-
Mann, H. B. & Whitney, D. R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 18, 50–60 (1947). ↩
-
Bray, J. R. & Curtis, J. T. An Ordination of the Upland Forest Communities of Southern Wisconsin. Ecol. Monogr. 27, 325–349 (1957). ↩
-
, P. Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bull. Société Vaudoise Sci. Nat. 37, 547 (1901). ↩
-
Anderson, M. J. Permutational Multivariate Analysis of Variance (PERMANOVA). in Wiley StatsRef: Statistics Reference Online 1–15 (John Wiley & Sons, Ltd, 2017). doi:10.1002/9781118445112.stat07841. ↩
-
Li, S. et al. Alteration in Oral Microbiome Among Men Who Have Sex With Men With Acute and Chronic HIV Infection on Antiretroviral Therapy. Front. Cell. Infect. Microbiol. 11, 695515 (2021). ↩
-
Yang, L. et al. HIV-induced immunosuppression is associated with colonization of the proximal gut by environmental bacteria. AIDS Lond. Engl. 30, 19–29 (2016). ↩ ↩2 ↩3
-
Saxena, D. et al. Modulation of the orodigestive tract microbiome in HIV-infected patients. Oral Dis. 22 Suppl 1, 73–78 (2016). ↩ ↩2 ↩3
-
Lu, X. et al. Gut Microbiome Alterations in Men Who Have Sex with Men-a Preliminary Report. Curr. HIV Res. (2022) doi:10.2174/1570162X20666220908105918. ↩ ↩2
-
Saeed, N. K., Farid, E. & Jamsheer, A. E. Prevalence of opportunistic infections in HIV-positive patients in Bahrain: a four-year review (2009-2013). J. Infect. Dev. Ctries. 9, 60–69 (2015). ↩
-
Brito, L. C. N. et al. Microbiologic profile of endodontic infections from HIV- and HIV+ patients using multiple-displacement amplification and checkerboard DNA-DNA hybridization. Oral Dis. 18, 558–567 (2012). ↩
-
Özen, S. et al. Catheter-related Infections in Pediatric Patients Due to a Rare Pathogen: Herbaspirillum huttiense. Pediatr. Infect. Dis. J. (2024) doi:10.1097/INF.0000000000004350. ↩
-
Ruiz de Villa, A., Alok, A., Oyetoran, A. E. & Fabara, S. P. Septic Shock and Bacteremia Secondary to Herbaspirillum huttiense: A Case Report and Review of Literature. Cureus 15, e36155 (2023). ↩
-
Wang, J., Song, Y., Liu, S., Jang, X. & Zhang, L. Persistent bacteremia caused by Ralstonia pickettii and Microbacterium: a case report. BMC Infect. Dis. 24, 327 (2024). ↩ ↩2