Skip to content

Commit

Permalink
Merge branch 'master' into pubs_update
Browse files Browse the repository at this point in the history
  • Loading branch information
christianholland authored Dec 21, 2021
2 parents d997031 + 72b97c2 commit 01820c3
Show file tree
Hide file tree
Showing 7 changed files with 23 additions and 10 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ sections/*.gdoc

data/my_publications/credit.gsheet
*.pdf
*.key

gcp_client.json

18 changes: 18 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
cff-version: 1.2.0
message: "If you use any of the analysis in this thesis please cite as below."
authors:
- family-names: "Holland"
given-names: "Christian H."
orcid: "https://orcid.org/0000-0002-3060-5786"
title: "From gene expression to pathway and transcription factor activities to study chronic liver diseases"
url: "https://github.com/christianholland/thesis"
preferred-citation:
type: unpublished
authors:
- family-names: "Holland"
given-names: "Christian H."
orcid: "https://orcid.org/0000-0002-3060-5786"
doi: "TBD"
month: TBD
title: "From gene expression to pathway and transcription factor activities to study chronic liver diseases"
year: 2021
6 changes: 0 additions & 6 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,6 @@ my_today = function() {
format(Sys.time(), '%d %B %Y')
}
```
<!--
The acknowledgments, preface, dedication, and abstract are added into the PDF
version automatically by inputting them in the YAML at the top of this file.
Alternatively, you can put that content in files like 00--prelim.Rmd and
00-abstract.Rmd like done below.
-->

`r if(!knitr:::is_latex_output()) '# Front matter {-}'`

Expand Down
2 changes: 1 addition & 1 deletion prelims/abstract.Rmd
Original file line number Diff line number Diff line change
@@ -1 +1 @@
High-throughput techniques such as microarrays and RNA-sequencing enable the relatively easy and inexpensive collection of bulk gene expression profiles from any biological condition. Recently, also the transcriptome of single cells can be efficiently captured via novel single-cell RNA-sequencing technologies. Functional analysis of bulk or single-cell gene expression data has been proven to be a powerful approach as they summarize the large and noisy gene expression space into a smaller number of biologically meaningful features such as pathway and transcription factor activities. In the first part of this thesis, I expanded the scope on the pathway analysis tool PROGENy and the transcription factor analysis tool DoRothEA through thorough benchmarking pipelines. First I transferred their regulatory knowledge from human to mouse to enable the functional characterization of gene expression profiles from mice. Moreover, I demonstrated the robustness and applicability of both tools on human single-cell RNA-sequencing data. In the second part of this thesis, I focussed on the analysis of gene expression profiles from mice and humans in the context of acute and chronic liver diseases. Finally, I identified and functionally characterized exclusively and commonly regulated genes of chronic and acute liver damage in mice and a set of genes that were consistently altered in a novel chronic mouse model and patients of chronic liver disease. Especially the latter demonstrates that, although major interspecies differences remain, there is a common and consistent transcriptomic response to chronic liver damage in mice and humans. This set of genes could be further investigated to study the pathophysiology of the liver in in-vitro and in-vivo studies.
High-throughput techniques such as microarrays and RNA-sequencing enable the relatively easy and inexpensive collection of bulk gene expression profiles from any biological condition. Recently, also the transcriptome of single cells can be efficiently captured via novel single-cell RNA-sequencing technologies. Functional analysis of bulk or single-cell gene expression data has been proven to be a powerful approach as they summarize the large and noisy gene expression space into a smaller number of biologically meaningful features such as pathway and transcription factor activities. In the first part of this thesis, I expanded the scope of the pathway analysis tool PROGENy and the transcription factor analysis tool DoRothEA through thorough benchmarking pipelines. First I transferred their regulatory knowledge from human to mouse to enable the functional characterization of gene expression profiles from mice. Moreover, I demonstrated the robustness and applicability of both tools on human single-cell RNA-sequencing data. In the second part of this thesis, I focussed on the analysis of gene expression profiles from mice and humans in the context of acute and chronic liver diseases. Finally, I identified and functionally characterized exclusively and commonly regulated genes of chronic and acute liver damage in mice and a set of genes that were consistently altered in a novel chronic mouse model and patients of chronic liver disease. Especially the latter demonstrates that, although major interspecies differences remain, there is a common and consistent transcriptomic response to chronic liver damage in mice and humans. This set of genes could be further investigated to study the pathophysiology of the liver in in-vitro and in-vivo studies.
2 changes: 1 addition & 1 deletion prelims/acknowledgements.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ When I started my Ph.D. back in May 2017 in Aachen I would have never expected t

First of all, I wish to express my deepest gratitude to my doctoral supervisor Prof. Dr. Julio Saez-Rodriguez who gave me the chance to perform my studies in his lab. Throughout this entire time, I felt welcomed and respected. His liberal way of leading the lab promoted a great working atmosphere I have never experienced before to this extent. I truly enjoyed the freedom and always felt safe if things went badly or ended up in a dead-end. Moreover, through the open-door sessions, Julio was reachable on a daily basis which I definitely don’t take for granted. In summary, Julio was the best supervisor I could have imagined and I deeply cherished the time I was allowed to spend in his lab.

Also, I am indebted to my faculty supervisor Prof. Dr. Ursula Klingmüller. Though it was not possible to share my entire Ph.D. process from the beginning with you due to the movement from Aachen to Heidelberg, I am nevertheless thankful for the lively discussions and feedback I received in the remaining time. Also, I would like to thank Prof. Dr. Robert Russell who kindly agreed to act as the chair of my thesis advisory committee. Lastly, I thank Prof. Dr. Karsten Niehaus who is my former supervisor of my bachelor’s and master’s studies at Bielefeld University for his willingness to be part of my committee.
Also, I am indebted to my faculty supervisor Prof. Dr. Ursula Klingmüller. Though it was not possible to share my entire Ph.D. process from the beginning with you due to the movement from Aachen to Heidelberg, I am nevertheless thankful for the lively discussions and feedback I received in the remaining time. Also, I would like to thank Prof. Dr. Robert Russell who kindly agreed to act as the chair of my thesis advisory committee. Lastly, I thank Prof. Dr. Karsten Rippe who completes my thesis advisory committee.

Next, I am thankful for every past and current “saezlab” member who made this time very special and makes me now also kind of sad that this time has come to an end.

Expand Down
2 changes: 1 addition & 1 deletion sections/03-scrna-seq-benchmark.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ My analysis suggested at different points that the performance of TF and pathway
## Conclusions
my systematic and comprehensive benchmark study suggests that functional analysis tools that rely on manually curated footprint gene sets are effective in inferring TF and pathway activity from scRNA-seq data, partially outperforming tools specifically designed for scRNA-seq analysis. In particular, the performance of DoRothEA and PROGENy was consistently better than all other tools. I showed the limits of both tools with respect to low gene coverage. I also provided recommendations on how to use DoRothEA’s and PROGENy’s gene sets in the best way dependent on the number of cells, reflecting the amount of available information, and sequencing depths. Furthermore, I showed that TF and pathway activities are rich in cell-type-specific information with a reduced amount of noise and provide an intuitive way of interpretation and hypothesis generation. I provide my benchmark data and code to the community for further assessment of methods for functional analysis.
My systematic and comprehensive benchmark study suggests that functional analysis tools that rely on manually curated footprint gene sets are effective in inferring TF and pathway activity from scRNA-seq data, partially outperforming tools specifically designed for scRNA-seq analysis. In particular, the performance of DoRothEA and PROGENy was consistently better than all other tools. I showed the limits of both tools with respect to low gene coverage. I also provided recommendations on how to use DoRothEA’s and PROGENy’s gene sets in the best way dependent on the number of cells, reflecting the amount of available information, and sequencing depths. Furthermore, I showed that TF and pathway activities are rich in cell-type-specific information with a reduced amount of noise and provide an intuitive way of interpretation and hypothesis generation. I provide my benchmark data and code to the community for further assessment of methods for functional analysis.
## Availability of data and materials
The code to perform all presented studies is written in R [@rcoreteam_software_2020; @gentleman_2004; @wickham_2016] and is freely available [on GitHub](https://github.com/saezlab/FootprintMethods_on_scRNAseq). The datasets supporting the conclusions of this article are available at [Zenodo](https://doi.org/10.5281/zenodo.3564179).
Expand Down
2 changes: 1 addition & 1 deletion sections/04-liver-disease-atlas.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ knitr::include_graphics(here::here("data/liver_disease_atlas/Figure 3.png"))
### Similarities between humans and mice {.unlisted}
I performed a cross-species analysis to evaluate how well the altered gene expression in the chronic CCl~4~ mouse model reflects the transcriptomic changes in humans that suffer from CLD. For this purpose, I collected genome-wide gene expression data from 5 publicly available patient cohorts with a total of 372 patients and five etiologies (Table \@ref(tab:liver-tab-1), Supplementary Figure \@ref(fig:liver-sfig-1)B). These studies allowed us to calculate a total of 15 contrasts due to different disease stages and phenotypes.

Similar to the acute mouse models I first analyzed inter-study consistency comparing the similarity of the top 500 differentially expressed genes from each signature. Differential genes obtained from studies of the same groups of authors showed a higher degree of similarity (Supplementary Figure \@ref(fig:liver-sfig-14)). The highest similarity of two independent contrasts was observed between NAFLD 7 and HCV 6 (Jaccard Index of 0.154). In summary, the similarity of the top differentially expressed genes in humans appeared to be low. However, the mutual enrichment of the top 500 up-and downregulated genes demonstrated a very high consistency of the direction of regulation within contrasts of the same group of authors but also observed still relatively high accordance across the cohorts reported by different authors (Figure \@ref(fig:liver-fig-4)A). Partially, the direction of regulation of genes from the cohorts of patients with PSC, PBC, and NAFLD did not match well with the other contrasts. However, all other pairwise comparisons yielded convincing consistent results. Similar to the analysis of the mouse studies, this systematic comparison shows that similarities between different studies can better be identified by an enrichment analysis that considers the orientation (up, down) of expression changes than just focussing on the top differentially expressed genes.
Similar to the acute mouse models I first analyzed inter-study consistency comparing the similarity of the top 500 differentially expressed genes from each signature. Differential genes obtained from studies of the same groups of authors showed a higher degree of similarity (Supplementary Figure \@ref(fig:liver-sfig-14)). The highest similarity of two independent contrasts was observed between NAFLD [@moylan_2014] and HCV [@ramnath_2018] (Jaccard Index of 0.154). In summary, the similarity of the top differentially expressed genes in humans appeared to be low. However, the mutual enrichment of the top 500 up-and downregulated genes demonstrated a very high consistency of the direction of regulation within contrasts of the same group of authors but also observed still relatively high accordance across the cohorts reported by different authors (Figure \@ref(fig:liver-fig-4)A). Partially, the direction of regulation of genes from the cohorts of patients with PSC, PBC, and NAFLD did not match well with the other contrasts. However, all other pairwise comparisons yielded convincing consistent results. Similar to the analysis of the mouse studies, this systematic comparison shows that similarities between different studies can better be identified by an enrichment analysis that considers the orientation (up, down) of expression changes than just focussing on the top differentially expressed genes.

Considering that previous studies reported only very low overlap between differentially expressed genes of humans and mice in CLD [@teufel_2016] and the above-described limitations of this type of comparison I performed a cross-species enrichment analysis between the chronic CCl~4~ model and the set of human data. For this purpose, I enriched the top 500 up-and downregulated genes from each human contrast in the three signatures from the individual time points of the chronic CCl~4~ experiment in mice. I found a high degree of accordance where all human gene sets were significantly and consistently enriched at any time point of the chronic CCl~4~ mouse signatures except for the up- and-downregulated genes of the PBC contrast, the downregulated genes of NAFLD contrast, and the upregulated genes of the NAFLD contrast; instead, I found that even the top 500 upregulated genes of the PBC contrast were significantly enriched among the downregulated genes of the 6-month time point of the chronic CCl~4~ signature (Figure \@ref(fig:liver-fig-4)B).

Expand Down

0 comments on commit 01820c3

Please sign in to comment.