diff --git a/.nojekyll b/.nojekyll index b7a8535..8da13f3 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -f09c6c9d \ No newline at end of file +64c1c1ee \ No newline at end of file diff --git a/data-science-with-pandas-2.html b/data-science-with-pandas-2.html index 20a246c..dbe07d8 100644 --- a/data-science-with-pandas-2.html +++ b/data-science-with-pandas-2.html @@ -1912,7 +1912,7 @@

grouped_data.mean()
-
/tmp/ipykernel_2494/1133710423.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
+
/tmp/ipykernel_2437/1133710423.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
   grouped_data.mean()
diff --git a/references.html b/references.html index e09fbf7..cdb06ce 100644 --- a/references.html +++ b/references.html @@ -303,8 +303,9 @@

References

+ diff --git a/search.json b/search.json index 7ba896a..69561ce 100644 --- a/search.json +++ b/search.json @@ -256,7 +256,7 @@ "href": "data-science-with-pandas-2.html#grouping", "title": "7  Grouping, Indexing, Slicing, and Subsetting DataFrames", "section": "7.5 Grouping", - "text": "7.5 Grouping\nWe often want to calculate summary statistics grouped by subsets or attributes within fields of our data. For example, we might want to calculate the average weight of all individuals per site.\nAs we have seen above we can calculate basic statistics for all records in a single column using the syntax below:\n\nsurveys_df['weight'].describe()\n\ncount 32283.000000\nmean 42.672428\nstd 36.631259\nmin 4.000000\n25% 20.000000\n50% 37.000000\n75% 48.000000\nmax 280.000000\nName: weight, dtype: float64\n\n\nIf we want to summarize by one or more variables, for example sex, we can use Pandas’ .groupby() method. Once we’ve created a groupby DataFrame, we can quickly calculate summary statistics by a group of our choice.\n\ngrouped_data = surveys_df.groupby('sex')\ngrouped_data.describe()\n\n\n\n\n\n\n\n\nrecord_id\nmonth\n...\nhindfoot_length\nweight\n\n\n\ncount\nmean\nstd\nmin\n25%\n50%\n75%\nmax\ncount\nmean\n...\n75%\nmax\ncount\nmean\nstd\nmin\n25%\n50%\n75%\nmax\n\n\nsex\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nF\n15690.0\n18036.412046\n10423.089000\n3.0\n8917.50\n18075.5\n27250.00\n35547.0\n15690.0\n6.587253\n...\n36.0\n64.0\n15303.0\n42.170555\n36.847958\n4.0\n20.0\n34.0\n46.0\n274.0\n\n\nM\n17348.0\n17754.835601\n10132.203323\n1.0\n8969.75\n17727.5\n26454.25\n35548.0\n17348.0\n6.396184\n...\n36.0\n58.0\n16879.0\n42.995379\n36.184981\n4.0\n20.0\n39.0\n49.0\n280.0\n\n\n\n\n2 rows × 56 columns\n\n\n\nThe output is a bit overwhelming. Let’s just have a look at one statistical value, the mean, to understand what is happening here:\n\ngrouped_data.mean()\n\n/tmp/ipykernel_2494/1133710423.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.\n grouped_data.mean()\n\n\n\n\n\n\n\n\n\nrecord_id\nmonth\nday\nyear\nplot_id\nhindfoot_length\nweight\n\n\nsex\n\n\n\n\n\n\n\n\n\n\n\nF\n18036.412046\n6.587253\n15.880943\n1990.644997\n11.440854\n28.836780\n42.170555\n\n\nM\n17754.835601\n6.396184\n16.078799\n1990.480401\n11.098282\n29.709578\n42.995379\n\n\n\n\n\n\n\nWe see that the data is divided into two groups, one group where the value in the column sex equals “F” and another group where the value in the column sex equals “M”. The statistics is then calculated for all samples in that specific group for each of the columns in the dataframe. Note that samples annotated with sex equals NaN and column values with NaN are left out." + "text": "7.5 Grouping\nWe often want to calculate summary statistics grouped by subsets or attributes within fields of our data. For example, we might want to calculate the average weight of all individuals per site.\nAs we have seen above we can calculate basic statistics for all records in a single column using the syntax below:\n\nsurveys_df['weight'].describe()\n\ncount 32283.000000\nmean 42.672428\nstd 36.631259\nmin 4.000000\n25% 20.000000\n50% 37.000000\n75% 48.000000\nmax 280.000000\nName: weight, dtype: float64\n\n\nIf we want to summarize by one or more variables, for example sex, we can use Pandas’ .groupby() method. Once we’ve created a groupby DataFrame, we can quickly calculate summary statistics by a group of our choice.\n\ngrouped_data = surveys_df.groupby('sex')\ngrouped_data.describe()\n\n\n\n\n\n\n\n\nrecord_id\nmonth\n...\nhindfoot_length\nweight\n\n\n\ncount\nmean\nstd\nmin\n25%\n50%\n75%\nmax\ncount\nmean\n...\n75%\nmax\ncount\nmean\nstd\nmin\n25%\n50%\n75%\nmax\n\n\nsex\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nF\n15690.0\n18036.412046\n10423.089000\n3.0\n8917.50\n18075.5\n27250.00\n35547.0\n15690.0\n6.587253\n...\n36.0\n64.0\n15303.0\n42.170555\n36.847958\n4.0\n20.0\n34.0\n46.0\n274.0\n\n\nM\n17348.0\n17754.835601\n10132.203323\n1.0\n8969.75\n17727.5\n26454.25\n35548.0\n17348.0\n6.396184\n...\n36.0\n58.0\n16879.0\n42.995379\n36.184981\n4.0\n20.0\n39.0\n49.0\n280.0\n\n\n\n\n2 rows × 56 columns\n\n\n\nThe output is a bit overwhelming. Let’s just have a look at one statistical value, the mean, to understand what is happening here:\n\ngrouped_data.mean()\n\n/tmp/ipykernel_2437/1133710423.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.\n grouped_data.mean()\n\n\n\n\n\n\n\n\n\nrecord_id\nmonth\nday\nyear\nplot_id\nhindfoot_length\nweight\n\n\nsex\n\n\n\n\n\n\n\n\n\n\n\nF\n18036.412046\n6.587253\n15.880943\n1990.644997\n11.440854\n28.836780\n42.170555\n\n\nM\n17754.835601\n6.396184\n16.078799\n1990.480401\n11.098282\n29.709578\n42.995379\n\n\n\n\n\n\n\nWe see that the data is divided into two groups, one group where the value in the column sex equals “F” and another group where the value in the column sex equals “M”. The statistics is then calculated for all samples in that specific group for each of the columns in the dataframe. Note that samples annotated with sex equals NaN and column values with NaN are left out." }, { "objectID": "data-science-with-pandas-2.html#structure-of-a-groupby-object", @@ -361,13 +361,13 @@ "href": "what-next.html#courses", "title": "What Next?", "section": "Courses:", - "text": "Courses:\n\nBest practices for writing reproducible code (UU?)\nVarious intermediate and advanced courses (eScience?) Center\nSoftware Carpentries\nPython for Data Science and Data Wrangling\nPython Data Science Handbook" + "text": "Courses:\n\nBest practices for writing reproducible code by UU RDM support\nVarious intermediate and advanced programming courses by the eScience Center\nSoftware Carpentries\nPython for Data Science and Data Wrangling (online book)\nPython Data Science Handbook (online book)" }, { "objectID": "what-next.html#find-us", "href": "what-next.html#find-us", "title": "What Next?", "section": "Find us:", - "text": "Find us:\nWe are happy to help you in your journey to master Python and use it in your own projects. You can find us at the following places: - Walk-In Hours, come with your questions! - Programming Cafe, informal meetup about programming. Bring your laptop, work on your project and get help when you need it! - UU Research Engineers - UU RDM consultants" + "text": "Find us:\nWe are happy to help you in your journey to master Python and use it in your own projects. You can find us at the following places:\n\nWalk-In Hours, come with your questions!\nProgramming Cafe, informal meetup about programming. Bring your laptop, work on your project and get help when you need it!\nUU Research Engineers\nUU RDM consultants" } ] \ No newline at end of file diff --git a/what-next.html b/what-next.html index 61d6cbb..72aec4a 100644 --- a/what-next.html +++ b/what-next.html @@ -311,16 +311,22 @@

Libraries:

Courses:

    -
  1. Best practices for writing reproducible code (UU?)
  2. -
  3. Various intermediate and advanced courses (eScience?) Center
  4. +
  5. Best practices for writing reproducible code by UU RDM support
  6. +
  7. Various intermediate and advanced programming courses by the eScience Center
  8. Software Carpentries
  9. -
  10. Python for Data Science and Data Wrangling
  11. -
  12. Python Data Science Handbook
  13. +
  14. Python for Data Science and Data Wrangling (online book)
  15. +
  16. Python Data Science Handbook (online book)

Find us:

-

We are happy to help you in your journey to master Python and use it in your own projects. You can find us at the following places: - Walk-In Hours, come with your questions! - Programming Cafe, informal meetup about programming. Bring your laptop, work on your project and get help when you need it! - UU Research Engineers - UU RDM consultants

+

We are happy to help you in your journey to master Python and use it in your own projects. You can find us at the following places:

+