Guest speaker: Dr Giovanni Birolo, University of Turin (Italy)
In this workshop, you will:
- Learn how to import a typical microbiome dataset into Python.
- Explore the dataset's structure, including metadata, feature tables, and taxonomy tables.
- Use statistical and machine learning libraries to classify and predict labels within your dataset.
We will focus on a dataset from Pat Schloss's lab, examining the murine gut microbiome to understand community membership and structure changes over time.
Before attending, please ensure you have a basic understanding of Python and some basics on its data manipulation libraries, such as pandas and numpy.
We will use the following Python libraries:
numpy
: For numerical operations.pandas
: For data manipulation and analysis.sklearn
: For applying machine learning techniques.seaborn
andmatplotlib
: For data visualization.
Ensure these libraries are installed and updated in your Python environment before the workshop.
The dataset comprises several tables reflecting different aspects of the microbiome study:
- Metadata: Attributes for each sample, including sample identifiers and labels.
- Feature Table: Abundances of each feature/species in each sample (some times this is referred to as OTU table).
- Taxonomy Table: Taxonomic classification for each feature.
The workshop is structured as follows:
- Introduction to Microbiome Data: Understanding the structure and content of microbiome datasets.
- Data Importing and Exploration: Loading datasets into Python and performing initial explorations.
- Visual Data Exploration: Using PCA for visual exploration of sample similarity.
- Statistical Analysis: Applying statistical tests to uncover significant differences in the data.
- Machine Learning Applications: Building and evaluating predictive models using machine learning.
- Model Evaluation: Assessing model performance with cross-validation.
Please ensure you have a recent version of Python installed. You can download and install the necessary libraries using pip (or conda, and environment file is shared in this folder):
pip install numpy pandas scikit-learn matplotlib seaborn