Skip to content

Commit

Permalink
docs: add index docs
Browse files Browse the repository at this point in the history
  • Loading branch information
suyuan32 committed Apr 25, 2024
1 parent 8e27f47 commit 0b2e6af
Show file tree
Hide file tree
Showing 4 changed files with 124 additions and 27 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
79 changes: 64 additions & 15 deletions src/en/guide/concepts/database/3-database-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,43 +111,48 @@ CREATE INDEX index_name ON table_name (field_name);
CREATE INDEX index_name ON table_name (field_name1, field_name2, ...);
```

## Index Creation Principles
Translate to English:

### Most Left Prefix Principle
## Principles for Creating Indexes

MySQL index uses the most left prefix principle, that is, only the most left prefix column of the index can be used in the query. For example, if a composite index `(a, b, c)` is created, then you can use `(a)`, `(a, b)`, `(a, b, c)` three indexes in the query, but you cannot use `(b, c)`, `(c)` and other indexes.
- Most Left Prefix Principle

MySQL indexes use the most left prefix principle, that is, only the most left prefix column of the index can be used in the query. For example, if a composite index `(a, b, c)` is created, then you can use `(a)`, `(a, b)`, `(a, b, c)` three indexes in the query, but you cannot use `(b, c)`, `(c)` and other indexes.
::: warning
The most left match principle can trigger index query when encountering `>=`, `<=`, `between`, `like prefix match`, but if it encounters `>` and `<`, it will not trigger index query.
:::

### Choose Unique Index
- Choose Unique Index

When choosing an index, you should prefer to choose a unique index, because the unique index can guarantee the uniqueness of the data and avoid data duplication.

### Choose High Discrimination Index
- Choose High Discrimination Index

When choosing an index, you should choose a high discrimination index. The high discrimination index can reduce the amount of data scanned and improve query efficiency.

### Choose Index Column
- Choose Index Column

When choosing an index column, you should choose a frequently queried column, avoid choosing infrequently used columns, and improve the utilization rate of the index.

### Try to Use Covering Index

When creating an index, you should try to use a covering index to reduce the back-table query data and improve query efficiency.

### Try to Use Short Index

When creating an index, you should try to use a short index. The short index can reduce the storage space of the index and improve the query efficiency of the index.
::: info Recommended Fields to Choose
- Frequently queried fields
- Frequently sorted fields
- Fields that are not NULL
- Fields often used for JOIN
:::

### Try to Use Prefix Index
- Try to Use Prefix Index

If the length of the index field is long, you can use a prefix index. The prefix index can reduce the storage space of the index and improve the query efficiency of the index.

### Try to Extend Index Instead of Creating New Index
- Try to Extend Index Instead of Creating New Index

If we already have an `a` index, if we need an `a,b` index, then we can directly extend the `b` field on the `a` index, instead of creating a new `a,b` index.

- Don't Have Too Many Indexes

The number of indexes in a single table should not be too many. Too many indexes will increase the maintenance cost of the data and reduce the data write efficiency.


## Index Pushdown

Expand All @@ -158,4 +163,48 @@ Index pushdown (Index Condition Pushdown) is an optimization feature introduced
Before there was no index pushdown, if we had a composite index `(a, b)`, the query condition was `a = 1 and b = 2`, MySQL would first use the index `(a, b)` to query all `a = 1` data, and then go back to the table to query the corresponding complete data row, and then use the `b = 2` condition to judge whether each row meets the condition, and return the data rows that meet the condition.

With index pushdown, MySQL will first use the index `(a, b)` to query all `a = 1 and b = 2` data, and then go back to the table to query the corresponding complete data row, reducing the amount of back-table query data and improving query efficiency.
:::

## Data Structure of Indexes

### B-Tree

A B-tree is a type of multi-way balanced search tree and is a commonly used index data structure.

::: info Characteristics of B-Tree:
- Each node contains multiple child nodes, and the number of child nodes for each node ranges from `[m/2, m]`, where `m` is the number of layers in the B-tree.
- All leaf nodes are on the same level.
- The root node has at least two child nodes unless the root node is a leaf node.
- A non-leaf node with `k` child nodes contains `k-1` key values.
- Each node contains both the index and all data.
:::

![btree](/assets/image/article/concept/btree.png)


### B+ Tree

A B+ tree is a variant of the B-tree and is a commonly used index data structure. Compared with the B-tree, the leaf nodes of the B+ tree only contain indexes, not data, all leaf nodes are on the same level, and leaf nodes are connected by pointers. Since the node only contains the index, under the same block size, the B+ tree can store more indexes, reduce the number of layers of the tree, and improve the query efficiency. Since the leaf nodes are connected by pointers, it can support range queries, and the query speed is much faster than the B-tree.

::: info Characteristics of B+ Tree:
- All leaf nodes are on the same level.
- Non-leaf nodes only contain indexes, not data.
- Leaf nodes are connected by pointers.
- Leaf nodes contain all data.
- For the same amount of data, the height of the B+ tree is lower than the B-tree.
:::

![b+tree](/assets/image/article/concept/bplustree.png)

::: warning Differences between B-Tree and B+ Tree

| | B-Tree | B+ Tree |
| :---------------------------------------: | :----------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: |
| **Data Pointers and Keys** | All internal nodes and leaf nodes contain data pointers and keys | Only leaf nodes contain data pointers and keys, internal nodes only contain keys |
| **Duplicate Keys** | There are no duplicate keys | Duplicate keys exist, all internal nodes also exist in the leaves |
| **Leaf Node Linking** | Leaf nodes are not linked to each other | Leaf nodes are linked to each other |
| **Sequential Access** | Sequential access of nodes is not possible, range queries require in-order traversal | All nodes exist in the leaves, so they can be accessed sequentially like a linked list |
| **Search Speed** | The speed of searching keys is slower | The search speed is faster |
| **Height for Specific Number of Entries** | For a specific number of entries, the height of the B-Tree is larger | For the same number of entries, the height of the B+ Tree is less than the B-Tree |

:::
72 changes: 60 additions & 12 deletions src/guide/concepts/database/3-database-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,40 +113,44 @@ CREATE INDEX 索引名 ON 表名 (字段名1, 字段名2, ...);

## 创建索引原则

### 最左前缀原则
- 最左前缀原则

MySQL 索引使用最左前缀原则,即在查询时只能使用索引的最左前缀列。例如,如果创建了一个组合索引 `(a, b, c)`,那么查询时可以使用 `(a)``(a, b)``(a, b, c)` 三种索引,但不能使用 `(b, c)``(c)` 等索引。
::: warning
最左匹配原则在遇到 `>=``<=``between``like 前缀匹配` 时可以触发索引,但是如果遇到 `>``<` 则不会触发索引查询。
:::

### 选择唯一性索引
- 选择唯一性索引

在选择索引时,应该优先选择唯一性索引,因为唯一性索引可以保证数据的唯一性,避免数据重复。

### 选择区分度高的索引
- 选择区分度高的索引

在选择索引时,应该选择区分度高的索引,区分度高的索引可以减少扫描的数据量,提高查询效率。

### 选择索引列
- 选择索引列

在选择索引列时,应该选择查询频繁的列,避免选择不常用的列,提高索引的利用率。

### 尽量使用覆盖索引
::: info 推荐选择的字段
- 频繁查询的字段
- 频繁排序的字段
- 不为 NULL 的字段
- 经常用来 JOIN 的字段
:::

在创建索引时,应该尽量使用覆盖索引,减少回表查询数据,提高查询效率。
- 尽量使用前缀索引

### 尽量使用短索引
如果索引字段的长度较长,可以使用前缀索引,前缀索引可以减少索引的存储空间,提高索引的查询效率。

在创建索引时,应该尽量使用短索引,短索引可以减少索引的存储空间,提高索引的查询效率。
- 尽量扩展索引而不是新建索引

### 尽量使用前缀索引
如果我们已经有一个 `a` 索引, 如果我们需要一个 `a,b` 索引,那么我们可以直接在 `a`索引上扩展 `b` 字段,而不是新建一个 `a,b` 索引。

如果索引字段的长度较长,可以使用前缀索引,前缀索引可以减少索引的存储空间,提高索引的查询效率。
- 索引数量不要太多

### 尽量扩展索引而不是新建索引
单表的索引数量不要太多,太多索引会增加数据的维护成本,降低数据写入效率。

如果我们已经有一个 `a` 索引, 如果我们需要一个 `a,b` 索引,那么我们可以直接在 `a`索引上扩展 `b` 字段,而不是新建一个 `a,b` 索引。


## 索引下推
Expand All @@ -160,3 +164,47 @@ MySQL 索引使用最左前缀原则,即在查询时只能使用索引的最
有了索引下推之后,MySQL 会先使用索引 `(a, b)` 查询出所有 `a = 1 and b = 2` 的数据,然后再回表查询对应的完整数据行,减少了回表查询的数据量,提高了查询效率。
:::

## 索引的数据结构

### B 树

B 树是一种多路平衡查找树,是一种常用的索引数据结构。

::: info B 树的特点:
- 每个节点都包含多个子节点,每个节点的子节点个数范围是 `[m/2, m]`,其中 `m` 是 B 树的层数。
- 所有叶子节点都在同一层。
- 根节点至少有两个子节点,除非根节点是叶子节点
-`k`个子节点的非叶子节点包含`k-1`个键值
- 每个节点都包含索引和全部数据
:::

![btree](/assets/image/article/concept/btree.png)


### B+ 树

B+ 树是 B 树的一种变种,是一种常用的索引数据结构。和 B 树相比,B+ 树的叶子节点只包含索引,不包含数据,所有叶子节点都在同一层,叶子节点之间通过指针连接。由于节点只包含索引,在同样的块大小下,B+ 树可以存储更多的索引,减少了树的层数,提高了查询效率。由于叶子节点之间通过指针连接,可以支持范围查询,查询速度大大快于 B 树。

::: info B+ 树的特点:
- 所有叶子节点都在同一层。
- 非叶子节点只包含索引,不包含数据。
- 叶子节点之间通过指针连接。
- 叶子节点包含全部数据。
- 同样数量的数据,B+ 树的高度比 B 树低。
:::

![b+tree](/assets/image/article/concept/bplustree.png)

::: warning B树和B+树的区别

| | B-树 | B+ 树 |
| :--------------------: | :--------------------------------------------: | :----------------------------------------------------: |
| **数据指针和键** | 所有内部节点和叶节点都包含数据指针和键 | 只有叶节点包含数据指针和键,内部节点只包含键 |
| **重复键** | 没有重复的键 | 存在重复的键,所有内部节点也存在于叶子中 |
| **叶节点链接** | 叶节点之间没有链接 | 叶节点之间相互链接 |
| **顺序访问** | 节点的顺序访问是不可能的, 范围查询需要中序遍历 | 所有节点都存在于叶子中,因此可以像链表一样进行顺序访问 |
| **搜索速度** | 搜索键的速度较慢 | 搜索速度更快 |
| **特定数量条目的高度** | 对于特定数量的条目,B 树的高度较大 | 对于相同数量的条目,B+ 树的高度小于 B-树 |

:::

0 comments on commit 0b2e6af

Please sign in to comment.