diff --git a/ch-pandas/data-preprocessing.ipynb b/ch-pandas/data-preprocessing.ipynb
index a535021..4823db9 100644
--- a/ch-pandas/data-preprocessing.ipynb
+++ b/ch-pandas/data-preprocessing.ipynb
@@ -68,7 +68,7 @@
"source": [
"## 处理重复值\n",
"\n",
- "检测数据集的记录是否存在重复,可以使用 `.duplicated` 函数进行验证,但是该函数返回的是数据集每一行的检测结果,即 n 行数据会返回 n 个布尔值。为了能够得到最直接的结果,可以使用 `any` 函数。该函数表示的是在多个条件判断中,只要有一个条件为 True,则 `any` 返回的结果为 True。"
+ "检测数据集的记录是否存在重复,可以使用 `.duplicated()` 函数进行验证,但是该函数返回的是数据集每一行的检测结果,即 n 行数据会返回 n 个布尔值。为了能够得到最直接的结果,可以使用 `any` 函数。该函数表示的是在多个条件判断中,只要有一个条件为 True,则 `any` 返回的结果为 True。"
]
},
{
@@ -104,7 +104,7 @@
"id": "610f3c20",
"metadata": {},
"source": [
- "如果有重复项,可以通过 `.drop_duplicated()` 删除。该函数有 inplace 参数,设置为 True 表示直接在原始数据集上做操作:`df.drop_duplicated(inplace = True)`。\n",
+ "如果有重复项,可以通过 `.drop_duplicated()` 删除。该函数有 `inplace` 参数,设置为 True 表示直接在原始数据集上做操作:`df.drop_duplicated(inplace = True)`。\n",
"\n",
"## 处理缺失值\n",
"\n",
@@ -191,7 +191,7 @@
" \n",
"可以使用 `.dropna()` 函数删除有缺失值的行或列。具体形式:`df.dropna(axis=0, how='any', inplace=False)`。\n",
"\n",
- "这个函数有参数 `axis`,`axis` 用来指定要删除的轴。`axis=0` 表示删除行(默认),axis=1 表示删除列。`how` 用来指定删除的条件。`how='any'` 表示删除包含任何缺失值的行(默认),`how='all'` 表示只删除所有值都是缺失值的行。`inplace` 用于指定是否在原始 `DataFrame` 上进行修改,默认为 False,表示不修改原始 `DataFrame`,而是返回一个新的 `DataFrame`。\n",
+ "这个函数有参数 `axis`,`axis` 用来指定要删除的轴。`axis=0` 表示删除行(默认),`axis=1` 表示删除列。`how` 用来指定删除的条件。`how='any'` 表示删除包含任何缺失值的行(默认),`how='all'` 表示只删除所有值都是缺失值的行。`inplace` 用于指定是否在原始 `DataFrame` 上进行修改,默认为 False,表示不修改原始 `DataFrame`,而是返回一个新的 `DataFrame`。\n",
"\n",
"例如,删除包含任何缺失值的行。"
]
@@ -697,7 +697,7 @@
"id": "480ca978",
"metadata": {},
"source": [
- "在这个例子中,`apply` 有个参数为 `axis`,`axis = 1` 设置函数对每一行操作;`axis = 0`` 设置函数对每一列操作;默认 axis = 0。\n",
+ "在这个例子中,`apply` 有个参数为 `axis`,`axis = 1` 设置函数对每一行操作;`axis = 0` 设置函数对每一列操作;默认 `axis = 0`。\n",
"\n",
"例:和 `.loc[]` 一起使用,进行更高级的数据切片。`.apply()` 返回对每一行做条件判断的一系列布尔值,以 `[]` 操作选择部分列。下面的选择条件为:如果 `country` 列属于特定国家,且 `POP > 40000`;如果 `country` 列不属于特定国家,且 `POP < 20000`"
]
diff --git a/ch-pandas/dataframe-slicing.ipynb b/ch-pandas/dataframe-slicing.ipynb
index b8552c0..950dc16 100644
--- a/ch-pandas/dataframe-slicing.ipynb
+++ b/ch-pandas/dataframe-slicing.ipynb
@@ -81,9 +81,16 @@
"* 使用 `.iloc` 或者 `.loc` 函数\n",
"* 使用 `.query` 函数\n",
"\n",
- "### 使用 [] 进行选择\n",
+ "### 使用 `[]` 进行选择\n",
+ "- 选择行\n",
"\n",
- "选择第 2 行到第 5 行(不包括第 5 行)的数据:"
+ "直接使用数字索引即可,`df[a,b]`表示选择 `DataFrame` 的第`a`行到第`b-1`行。\n",
+ "\n",
+ "```{note}\n",
+ "Python中的索引区间都是左闭右开区间,这意味着左边端点可以取到,而右边端点取不到。\n",
+ "```\n",
+ "\n",
+ "例:对上一章节的PWT案例数据 df 选择第 2 行到第 5 行(不包括第 5 行)的数据。\n"
]
},
{
@@ -254,7 +261,11 @@
"id": "eb81787d",
"metadata": {},
"source": [
- "要选择列,我们可以传递一个列表,其中包含所需列的列名,为字符串形式。"
+ "- 选择列\n",
+ "\n",
+ "我们可以传递一个列表,其中包含所需列的列名,为字符串形式。\n",
+ "\n",
+ "例:选择 country 和 tcgdp 两列。"
]
},
{
@@ -389,7 +400,9 @@
"source": [
"如果只选取一列,`df['country']` 等价于 `df.country`。\n",
"\n",
- "`[]` 还可以用来选择符合特定条件的数据。 例如,选取 POP 大于 20000 的行。判断语句 `df.POP> 20000` 会返回一系列布尔值,符合 POP 大于 20000 条件的会返回为 `True`。如果想要选择这些符合条件的数据,则需要:"
+ "- `[]` 选择符合特定条件的数据。 \n",
+ "\n",
+ "例如,选取 POP 大于 20000 的行。判断语句 `df.POP> 20000` 会返回一系列布尔值,符合 POP 大于 20000 条件的会返回为 `True`。如果想要选择这些符合条件的数据,则需要:"
]
},
{
@@ -789,7 +802,7 @@
"id": "9b41ebb1",
"metadata": {},
"source": [
- "如果选择 cc 列和 cg 列的和大于 80 并且 POP 小于 20000 的行:"
+ "例:选择 cc 列和 cg 列的和大于 80 并且 POP 小于 20000 的行。"
]
},
{
@@ -1287,7 +1300,7 @@
"source": [
"使用 `loc` 函数进行选择,与 `iloc` 的区别在于,`loc` 除了接受整数外,还可以接受标签(`a`、`b` 这样的列名)、表示整数位置的 index、`boolean` 。\n",
"\n",
- "选择第 2 行到第 5 行(不包括第 5 行),`country` 和 `tcgdp` 列:"
+ "例:选择第 2 行到第 5 行(不包括第 5 行),country 和 tcgdp 列。"
]
},
{
@@ -1369,7 +1382,7 @@
"id": "44f9c427",
"metadata": {},
"source": [
- "使用 `loc` 函数选择 POP 列最大值的行:"
+ "例:使用 `loc` 函数选择 POP 列最大值的行。"
]
},
{
@@ -1517,7 +1530,9 @@
"id": "97dd2bd9",
"metadata": {},
"source": [
- "还可以使用这种形式:`.loc[,]`,两个参数用逗号隔开,第一个参数接受条件,第二个参数接受我们想要返回的列名,得到的是符合条件的特定的列。"
+ "还可以使用这种形式:`.loc[,]`,两个参数用逗号隔开,第一个参数接受条件,第二个参数接受我们想要返回的列名,得到的是符合条件的特定的列。\n",
+ "\n",
+ "例:选择满足 cc 列加 cg 列大于等于80,POP小于等于20000条件的 country, year, POP 三列。"
]
},
{
@@ -2455,8 +2470,22 @@
}
],
"metadata": {
+ "kernelspec": {
+ "display_name": "pyds",
+ "language": "python",
+ "name": "python3"
+ },
"language_info": {
- "name": "python"
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.5"
}
},
"nbformat": 4,
diff --git a/ch-pandas/series-dataframe.ipynb b/ch-pandas/series-dataframe.ipynb
index d579fa3..b17e0bf 100644
--- a/ch-pandas/series-dataframe.ipynb
+++ b/ch-pandas/series-dataframe.ipynb
@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
- "execution_count": 25,
+ "execution_count": 31,
"id": "c3935c76",
"metadata": {
"execution": {
@@ -56,7 +56,7 @@
},
{
"cell_type": "code",
- "execution_count": 26,
+ "execution_count": 32,
"id": "9f760577",
"metadata": {
"execution": {
@@ -66,9 +66,25 @@
"shell.execute_reply": "2023-09-11T14:28:40.645276Z"
}
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 1\n",
+ "1 2\n",
+ "2 3\n",
+ "3 4\n",
+ "Name: my_series, dtype: int64"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "s = pd.Series([1, 2, 3, 4], name = 'my_series')"
+ "s = pd.Series([1, 2, 3, 4], name = 'my_series')\n",
+ "s"
]
},
{
@@ -78,12 +94,16 @@
"source": [
"`Series` 是一个数组状数据结构,其实就是 {numref}`numpy-ndarray` 中的 `ndarray`。 数组最重要的结构是索引(Index)。Index 主要用于标记第几个位置存储什么数据。`pd.Series()` 中不指定 Index 参数时,默认从 0 开始,逐一自增,形如: 0,1,...\n",
"\n",
- "- Series 支持计算操作。"
+ "- `Series` 支持计算操作。\n",
+ "\n",
+ " 可以对Series对象执行基本的数学运算,如加法、减法、乘法和除法。\n",
+ "\n",
+ "例:对上述构建的 s,进行乘法操作。"
]
},
{
"cell_type": "code",
- "execution_count": 27,
+ "execution_count": 33,
"id": "94626599",
"metadata": {
"execution": {
@@ -104,7 +124,7 @@
"Name: my_series, dtype: int64"
]
},
- "execution_count": 27,
+ "execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
@@ -113,17 +133,75 @@
"s * 100"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "也可以对多个 `Series` 对象进行数学操作。\n",
+ "```{note}\n",
+ "当多个 Series 对象操作时,如果形状不同,比如 s1 有 4 个数,s2 有 5 个数,s1 + s2 操作后,返回结果会有 5 个数,但是第 5 个数为 NaN 值。\n",
+ "```\n",
+ "\n",
+ "例:构建 s2,与 s 进行加减乘除操作。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "s+s2结果为\n",
+ "0 2\n",
+ "1 4\n",
+ "2 6\n",
+ "3 8\n",
+ "dtype: int64\n",
+ "s-s2结果为\n",
+ "0 0\n",
+ "1 0\n",
+ "2 0\n",
+ "3 0\n",
+ "dtype: int64\n",
+ "s*s2结果为\n",
+ "0 1\n",
+ "1 4\n",
+ "2 9\n",
+ "3 16\n",
+ "dtype: int64\n",
+ "s/s2结果为\n",
+ "0 1.0\n",
+ "1 1.0\n",
+ "2 1.0\n",
+ "3 1.0\n",
+ "dtype: float64\n"
+ ]
+ }
+ ],
+ "source": [
+ "s2 = pd.Series([1, 2, 3, 4])\n",
+ "print('s+s2结果为\\n{}'.format(s+s2))\n",
+ "print('s-s2结果为\\n{}'.format(s-s2))\n",
+ "print('s*s2结果为\\n{}'.format(s*s2))\n",
+ "print('s/s2结果为\\n{}'.format(s/s2))\n"
+ ]
+ },
{
"cell_type": "markdown",
"id": "ee28bbf0",
"metadata": {},
"source": [
- "- Series 支持描述性统计。比如,获得所有统计信息。"
+ "- `Series` 支持描述性统计,可以使用`.describe()`方法同时获取 [计数、均值、标准差、最小值,25%分位数,50%分位数,75%分位数和最大值] 的统计信息,也可以使用`.max()`等特定的统计量方法单独获取对应的信息。\n",
+ "\n",
+ "例:对上例 s 获得所有统计信息。"
]
},
{
"cell_type": "code",
- "execution_count": 28,
+ "execution_count": 36,
"id": "3ccd82a9",
"metadata": {
"execution": {
@@ -148,7 +226,7 @@
"Name: my_series, dtype: float64"
]
},
- "execution_count": 28,
+ "execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
@@ -162,12 +240,12 @@
"id": "fb61cc9f",
"metadata": {},
"source": [
- "计算平均值,中位数和标准差。"
+ "例:单独计算平均值,中位数和标准差。"
]
},
{
"cell_type": "code",
- "execution_count": 29,
+ "execution_count": 37,
"id": "4e7d31be",
"metadata": {
"execution": {
@@ -184,7 +262,7 @@
"2.5"
]
},
- "execution_count": 29,
+ "execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
@@ -195,7 +273,7 @@
},
{
"cell_type": "code",
- "execution_count": 30,
+ "execution_count": 38,
"id": "2c7599d6",
"metadata": {
"execution": {
@@ -212,7 +290,7 @@
"2.5"
]
},
- "execution_count": 30,
+ "execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
@@ -223,7 +301,7 @@
},
{
"cell_type": "code",
- "execution_count": 31,
+ "execution_count": 39,
"id": "0c3aab52",
"metadata": {
"execution": {
@@ -240,7 +318,7 @@
"1.2909944487358056"
]
},
- "execution_count": 31,
+ "execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
@@ -254,12 +332,16 @@
"id": "b9aafec8",
"metadata": {},
"source": [
- "- Series 的索引很灵活。"
+ "- `Series` 的索引很灵活。\n",
+ "\n",
+ "除了上述默认的序数 index 作为索引,也可以自定义索引方式。\n",
+ "\n",
+ "例:将 s 的 0,1,2,3 的索引依次改为 number1, number2, number3,number4。"
]
},
{
"cell_type": "code",
- "execution_count": 32,
+ "execution_count": 63,
"id": "55e7037b",
"metadata": {
"execution": {
@@ -279,12 +361,14 @@
"id": "c3c4636d",
"metadata": {},
"source": [
- "这时,`Series` 就像一个 Python 中的字典 `dict`,可以使用像 `dict` 一样的语法来访问 `Series` 中的元素,其中 `index` 相当于 `dict` 的键 `key`。例如,使用 `[]` 操作符访问 `number1` 对应的值。"
+ "这时,`Series` 就像一个 Python 中的字典 `dict`,可以使用像 `dict` 一样的语法来访问 `Series` 中的元素,其中 `index` 相当于 `dict` 的键 `key`。\n",
+ "\n",
+ "例如,使用 `[]` 操作符访问 `number1` 对应的值。"
]
},
{
"cell_type": "code",
- "execution_count": 33,
+ "execution_count": 64,
"id": "a9287533",
"metadata": {
"execution": {
@@ -301,7 +385,7 @@
"1"
]
},
- "execution_count": 33,
+ "execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
@@ -320,7 +404,7 @@
},
{
"cell_type": "code",
- "execution_count": 34,
+ "execution_count": 65,
"id": "7cd8388f",
"metadata": {
"execution": {
@@ -337,7 +421,7 @@
"True"
]
},
- "execution_count": 34,
+ "execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
@@ -346,6 +430,41 @@
"'number1' in s"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "- `value_counts()`计算`Series`中唯一值的频率,这个方法返回一个索引为唯一值,值为对应频率的新的`Series`。\n",
+ "\n",
+ "这种方法一方面可以快速了解数据集中各个值的分布情况,知道是否有异常值、缺失值或者某些值的频率很低,也有利于后续进行一些可视化处理。\n",
+ "\n",
+ "例:对s3序列进行计数,可见A出现的频率最高而C出现的频率最低。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 83,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "A 4\n",
+ "B 3\n",
+ "C 1\n",
+ "Name: count, dtype: int64"
+ ]
+ },
+ "execution_count": 83,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "s3 = pd.Series(['A', 'B', 'A', 'C', 'B', 'A', 'A', 'B']) \n",
+ "s3.value_counts()"
+ ]
+ },
{
"cell_type": "markdown",
"id": "803dff93",
@@ -368,12 +487,14 @@
"\n",
"创建一个 `DataFrame` 有很多方式,比如从列表、字典、文件中读取数据,并创建一个 `DataFrame`。\n",
"\n",
- "- 基于列表创建"
+ "- 基于列表创建\n",
+ "\n",
+ "例:创建一个第一列为 Name,第二列为 Age,第三列为 City 的 `DataFrame`。"
]
},
{
"cell_type": "code",
- "execution_count": 35,
+ "execution_count": 66,
"id": "29fda5e0",
"metadata": {
"execution": {
@@ -383,13 +504,75 @@
"shell.execute_reply": "2023-09-11T14:28:40.759646Z"
}
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Name | \n",
+ " Age | \n",
+ " City | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Alice | \n",
+ " 25 | \n",
+ " New York | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Bob | \n",
+ " 30 | \n",
+ " San Francisco | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Charlie | \n",
+ " 22 | \n",
+ " Los Angeles | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Name Age City\n",
+ "0 Alice 25 New York\n",
+ "1 Bob 30 San Francisco\n",
+ "2 Charlie 22 Los Angeles"
+ ]
+ },
+ "execution_count": 66,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"names = ['Alice', 'Bob', 'Charlie']\n",
"ages = [25, 30, 22]\n",
"cities = ['New York', 'San Francisco', 'Los Angeles']\n",
"data = {'Name': names, 'Age': ages, 'City': cities}\n",
- "df = pd.DataFrame(data)"
+ "df = pd.DataFrame(data)\n",
+ "df"
]
},
{
@@ -397,12 +580,14 @@
"id": "44c4ceb3",
"metadata": {},
"source": [
- "- 基于字典创建"
+ "- 基于字典创建\n",
+ "\n",
+ "例:创建一个第一列为 Column1,第二列为 Column2 的 `DataFrame`。"
]
},
{
"cell_type": "code",
- "execution_count": 36,
+ "execution_count": 67,
"id": "12cbb41a",
"metadata": {
"execution": {
@@ -412,10 +597,62 @@
"shell.execute_reply": "2023-09-11T14:28:40.765960Z"
}
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Column1 | \n",
+ " Column2 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 4 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Column1 Column2\n",
+ "0 1 3\n",
+ "1 2 4"
+ ]
+ },
+ "execution_count": 67,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"data = {'Column1': [1, 2], 'Column2': [3, 4]}\n",
- "df = pd.DataFrame(data)"
+ "df = pd.DataFrame(data)\n",
+ "df"
]
},
{
@@ -478,7 +715,7 @@
},
{
"cell_type": "code",
- "execution_count": 37,
+ "execution_count": 68,
"metadata": {},
"outputs": [
{
@@ -523,7 +760,7 @@
},
{
"cell_type": "code",
- "execution_count": 38,
+ "execution_count": 69,
"metadata": {},
"outputs": [
{
@@ -608,7 +845,7 @@
"max 2.000000 4.000000"
]
},
- "execution_count": 38,
+ "execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
@@ -630,7 +867,7 @@
},
{
"cell_type": "code",
- "execution_count": 39,
+ "execution_count": 70,
"metadata": {},
"outputs": [
{
@@ -697,7 +934,7 @@
"mean NaN 3.5"
]
},
- "execution_count": 39,
+ "execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
@@ -720,7 +957,7 @@
},
{
"cell_type": "code",
- "execution_count": 40,
+ "execution_count": 71,
"id": "ec624d37",
"metadata": {
"execution": {
@@ -777,7 +1014,7 @@
},
{
"cell_type": "code",
- "execution_count": 41,
+ "execution_count": 72,
"id": "c052691d",
"metadata": {
"execution": {
@@ -790,7 +1027,7 @@
"outputs": [],
"source": [
"import pandas as pd\n",
- "\n",
+ "# 注:直接输入文件绝对路径即可,这里的 os.path.join 是将文件夹的路径和文件名结合一起\n",
"df = pd.read_csv(os.path.join(folder_path, \"pwt70_w_country_names.csv\"))"
]
},
@@ -804,7 +1041,7 @@
},
{
"cell_type": "code",
- "execution_count": 42,
+ "execution_count": 73,
"id": "23a5ed20",
"metadata": {
"execution": {
@@ -1010,7 +1247,7 @@
"[5 rows x 37 columns]"
]
},
- "execution_count": 42,
+ "execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
@@ -1030,7 +1267,7 @@
},
{
"cell_type": "code",
- "execution_count": 43,
+ "execution_count": 74,
"id": "f84177de",
"metadata": {
"execution": {
@@ -1243,7 +1480,7 @@
"[5 rows x 37 columns]"
]
},
- "execution_count": 43,
+ "execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
@@ -1262,7 +1499,7 @@
},
{
"cell_type": "code",
- "execution_count": 44,
+ "execution_count": 75,
"id": "5f0544f4",
"metadata": {
"execution": {
@@ -1338,7 +1575,7 @@
},
{
"cell_type": "code",
- "execution_count": 45,
+ "execution_count": 76,
"id": "821c9174",
"metadata": {
"execution": {
@@ -1392,7 +1629,7 @@
"dtype: object"
]
},
- "execution_count": 45,
+ "execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
@@ -1411,7 +1648,7 @@
},
{
"cell_type": "code",
- "execution_count": 46,
+ "execution_count": 77,
"id": "aae06a0c",
"metadata": {
"execution": {
@@ -1433,7 +1670,7 @@
" dtype='object')"
]
},
- "execution_count": 46,
+ "execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
@@ -1448,18 +1685,434 @@
"source": [
"- `rename()` 函数既可以用于更改行标签,也可以用于列标签。传入一个字典,其中键为当前名称,值为新名称,以更新相应的名称。\n",
"\n",
- "例:\n",
- "1. 将 year 改为 Year,country 改为 Country:\n",
- "\n",
- "```\n",
- "df_renamed = df.rename(columns={'year':Year, 'country':'Country'})\n",
- "```\n",
- "\n",
- "2. 将所有列名改为小写:\n",
- "\n",
- "```\n",
- "df_renamed = df.rename(columns=str.lower)\n",
- "```"
+ "例:将year改为Year,country改为Country。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Country | \n",
+ " isocode | \n",
+ " Year | \n",
+ " POP | \n",
+ " XRAT | \n",
+ " Currency_Unit | \n",
+ " ppp | \n",
+ " tcgdp | \n",
+ " cgdp | \n",
+ " cgdp2 | \n",
+ " ... | \n",
+ " kg | \n",
+ " ki | \n",
+ " openk | \n",
+ " rgdpeqa | \n",
+ " rgdpwok | \n",
+ " rgdpl2wok | \n",
+ " rgdpl2pe | \n",
+ " rgdpl2te | \n",
+ " rgdpl2th | \n",
+ " rgdptt | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1950 | \n",
+ " 8150.368 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1951 | \n",
+ " 8284.473 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1952 | \n",
+ " 8425.333 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1953 | \n",
+ " 8573.217 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1954 | \n",
+ " 8728.408 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
5 rows × 37 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Country isocode Year POP XRAT Currency_Unit ppp tcgdp cgdp \\\n",
+ "0 Afghanistan AFG 1950 8150.368 NaN NaN NaN NaN NaN \n",
+ "1 Afghanistan AFG 1951 8284.473 NaN NaN NaN NaN NaN \n",
+ "2 Afghanistan AFG 1952 8425.333 NaN NaN NaN NaN NaN \n",
+ "3 Afghanistan AFG 1953 8573.217 NaN NaN NaN NaN NaN \n",
+ "4 Afghanistan AFG 1954 8728.408 NaN NaN NaN NaN NaN \n",
+ "\n",
+ " cgdp2 ... kg ki openk rgdpeqa rgdpwok rgdpl2wok rgdpl2pe rgdpl2te \\\n",
+ "0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "2 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "3 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "4 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "\n",
+ " rgdpl2th rgdptt \n",
+ "0 NaN NaN \n",
+ "1 NaN NaN \n",
+ "2 NaN NaN \n",
+ "3 NaN NaN \n",
+ "4 NaN NaN \n",
+ "\n",
+ "[5 rows x 37 columns]"
+ ]
+ },
+ "execution_count": 78,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_renamed = df.rename(columns={'year':'Year','country':'Country'})\n",
+ "df_renamed.head(5)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " 例:将所有列名改为小写。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 79,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " country | \n",
+ " isocode | \n",
+ " year | \n",
+ " pop | \n",
+ " xrat | \n",
+ " currency_unit | \n",
+ " ppp | \n",
+ " tcgdp | \n",
+ " cgdp | \n",
+ " cgdp2 | \n",
+ " ... | \n",
+ " kg | \n",
+ " ki | \n",
+ " openk | \n",
+ " rgdpeqa | \n",
+ " rgdpwok | \n",
+ " rgdpl2wok | \n",
+ " rgdpl2pe | \n",
+ " rgdpl2te | \n",
+ " rgdpl2th | \n",
+ " rgdptt | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1950 | \n",
+ " 8150.368 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1951 | \n",
+ " 8284.473 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1952 | \n",
+ " 8425.333 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1953 | \n",
+ " 8573.217 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " Afghanistan | \n",
+ " AFG | \n",
+ " 1954 | \n",
+ " 8728.408 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " ... | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
5 rows × 37 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " country isocode year pop xrat currency_unit ppp tcgdp cgdp \\\n",
+ "0 Afghanistan AFG 1950 8150.368 NaN NaN NaN NaN NaN \n",
+ "1 Afghanistan AFG 1951 8284.473 NaN NaN NaN NaN NaN \n",
+ "2 Afghanistan AFG 1952 8425.333 NaN NaN NaN NaN NaN \n",
+ "3 Afghanistan AFG 1953 8573.217 NaN NaN NaN NaN NaN \n",
+ "4 Afghanistan AFG 1954 8728.408 NaN NaN NaN NaN NaN \n",
+ "\n",
+ " cgdp2 ... kg ki openk rgdpeqa rgdpwok rgdpl2wok rgdpl2pe rgdpl2te \\\n",
+ "0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "2 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "3 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "4 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN \n",
+ "\n",
+ " rgdpl2th rgdptt \n",
+ "0 NaN NaN \n",
+ "1 NaN NaN \n",
+ "2 NaN NaN \n",
+ "3 NaN NaN \n",
+ "4 NaN NaN \n",
+ "\n",
+ "[5 rows x 37 columns]"
+ ]
+ },
+ "execution_count": 79,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_renamed = df.rename(columns=str.lower)\n",
+ "df_renamed.head(5)"
]
},
{
@@ -1472,7 +2125,7 @@
},
{
"cell_type": "code",
- "execution_count": 47,
+ "execution_count": 80,
"id": "9e1b3c2f",
"metadata": {
"execution": {
@@ -1489,7 +2142,7 @@
"RangeIndex(start=0, stop=11400, step=1)"
]
},
- "execution_count": 47,
+ "execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
@@ -1508,7 +2161,7 @@
},
{
"cell_type": "code",
- "execution_count": 48,
+ "execution_count": 81,
"id": "ccc0d0c6",
"metadata": {
"execution": {