\n",
"\n",
- "### [Additional Theory] Micro Exercise 6 - Applying custom functions
\n",
+ "### [Additional Material] Micro Exercise 6 - Applying custom functions
\n",
"\n",
"* Write your own implementation of the `mean()` function, then apply it to the \"Age\" and \"Fare\" columns.\n",
"* Verify your results against the result obtained by `df.mean()`.\n",
@@ -1973,7 +1970,7 @@
" For this second solution, you can use the `isnan()` function from the `math` module to test whether\n",
" a value is `NaN` or not.\n",
"\n",
- "
"
]
},
{
@@ -1992,7 +1989,7 @@
"\n",
"[Back to ToC](#toc)\n",
"\n",
- "## Grouping data by factor
\n",
"---------------------------------\n",
"\n",
"When analyzing a dataset where some variables (columns) are factors (categorical values), it is often useful to group the samples (rows) by these factors.\n",
@@ -2065,7 +2062,7 @@
"\n",
"
\n",
"\n",
- "### Micro Exercise 7 - Grouping data
\n",
+ "### Micro Exercise 7 - Grouping data
\n",
"\n",
"* Make a copy of the `df` data frame with `df.copy()`. Name it `dfc`, as shown here:\n",
"```python\n",
@@ -2101,7 +2098,7 @@
"
\n",
"
\n",
"\n",
- "## Exercise 5.1 - 5.3
\n",
+ "## Exercise 5.1 - 5.3
\n",
"-------------------------"
]
},
@@ -2120,10 +2117,13 @@
"\n",
"[Back to ToC](#toc)\n",
"\n",
+ "
\n",
"\n",
- "## Additional Theory
\n",
+ "## Additional Material
\n",
"-----------------------------\n",
"\n",
+ "
\n",
+ "\n",
"### About the example dataset used in the Additional Theory section\n",
"\n",
"To illustrate pandas' functionalities, we will here use an example dataset that contains gene expression data. This dataset originates from a [study that investigated stress response in the hearts of mice deficient in the SRC-2 gene](http://www.ncbi.nlm.nih.gov/pubmed/23300926) (transcriptional regulator steroid receptor co-activator-2).\n",
@@ -2167,7 +2167,7 @@
"[Back to ToC](#toc)\n",
"\n",
"\n",
- "### Selecting/Filtering dataframes
\n",
+ "### Selecting/Filtering dataframes
\n",
"\n",
"Lets read the dataframe again."
]
@@ -2189,6 +2189,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
+ "
\n",
+ "\n",
"And filter it based on some criteria that we may be interested in. \n",
"\n",
"For example say we want to find genes that have at least 250 reads in the 'Heart_WT_1' sample. We would do it like this: first we find the genes that satisfy the condition:"
@@ -2217,6 +2219,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
+ "
\n",
+ "\n",
"Applying the `>` operator returns a boolean Series with the result of the function on every element of the Series. Then, to select the corresponding elements of the dataframe, we use the boolean Series to slice the original dataframe:"
]
},
@@ -2243,6 +2247,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
+ "
\n",
+ "\n",
"We can design more complicated filters, as below, we select genes that have more than 250 reads in WT samples, less than 150 in all KO samples:"
]
},
@@ -2265,9 +2271,14 @@
{
"cell_type": "markdown",
"metadata": {
- "collapsed": true
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
},
"source": [
+ "
\n",
+ "\n",
"We can also slice the result of filtering. For example, let's say that we want to extract the genes with more than 250 reads in the first WT and less than 50 reads the first KO sample but then also only keep these two columns of the data."
]
},
@@ -2339,7 +2350,7 @@
"\n",
"[Back to ToC](#toc)\n",
"\n",
- "### Sorting operations on dataframes
\n",
+ "### Sorting operations on dataframes
\n",
"\n",
"DataFrames can be sorted on one or more specific column(s) using `sort_values():"
]
@@ -2365,9 +2376,7 @@
{
"cell_type": "code",
"execution_count": null,
- "metadata": {
- "scrolled": false
- },
+ "metadata": {},
"outputs": [],
"source": [
"df.sort_index(ascending=True).head()"
@@ -2399,9 +2408,7 @@
{
"cell_type": "code",
"execution_count": null,
- "metadata": {
- "scrolled": false
- },
+ "metadata": {},
"outputs": [],
"source": [
"df.min(axis=1).head()"
@@ -2469,7 +2476,7 @@
"\n",
"[Back to ToC](#toc)\n",
"\n",
- "### Extending a dataframe by adding new columns
\n",
+ "### Extending a dataframe by adding new columns
\n",
"\n",
"We can set up a new dataframe and concatenate it to the original dataframe using the `concat` method:"
]
@@ -2528,7 +2535,7 @@
"\n",
"[Back to ToC](#toc)\n",
"\n",
- "### Use of numpy functions with pandas dataframes
\n",
+ "### Use of numpy functions with pandas dataframes
\n",
"\n",
"\n",
"Let's say we want to calculate the log average expression value. We could do it like this:"
@@ -2623,7 +2630,7 @@
"\n",
"[Back to ToC](#toc)\n",
"\n",
- "### Merge and join DataFrames
\n",
+ "### Merge and join DataFrames
\n",
"\n",
"The [`merge()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) and [`join()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html) methods allow to combine DataFrames, linking their rows based on their keys. \n",
"\n",
@@ -2721,9 +2728,7 @@
{
"cell_type": "code",
"execution_count": null,
- "metadata": {
- "scrolled": false
- },
+ "metadata": {},
"outputs": [],
"source": [
"df.head()"
@@ -2773,7 +2778,7 @@
"source": [
"
\n",
"\n",
- "### Cross-tabulated tables
\n",
+ "### Cross-tabulated tables
\n",
"\n",
"Cross-tabulated tables for two (or more) columns (factors) of a DataFrame can be created with **`pd.crosstab()`**."
]
@@ -2800,7 +2805,7 @@
"[Back to ToC](#toc)\n",
"\n",
"\n",
- "### Plotting with pandas and matplotlib
\n",
+ "### Plotting with pandas and matplotlib
\n",
"\n",
"Now let's explore our data a bit. First, a matrix of scatter plots for all pairwise sample comparisons (note: the cell below can take 10-20 seconds to compute):"
]
@@ -2808,9 +2813,7 @@
{
"cell_type": "code",
"execution_count": null,
- "metadata": {
- "scrolled": false
- },
+ "metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
@@ -2916,7 +2919,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.8.10"
+ "version": "3.10.12"
},
"vscode": {
"interpreter": {
@@ -2925,5 +2928,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}