第37期：适当的使用 MySQL 原生表分区

MySQL 数据库现在主要用的引擎是 InnoDB ，InnoDB 没有类似于 MERGE 引擎这样的原生拆表方案，不过有原生分区表，以水平方式拆分记录集，对应用端透明。

分区表的存在为超大表的检索请求、日常管理提供了一种额外的选择途径。分区表使用得当，对数据库性能会有大幅提升。

分区表主要有以下几种优势：

大幅提升某些查询的性能。
简化日常数据运维工作量、提升运维效率。
并行查询、均衡写 IO 。
对应用透明，不需要在应用层部署路由或者中间层。

接下来我们用实际例子来让前两种优势体现更新清晰。

针对检索来讲：

优化查询性能（范围查询）

拆分合适的分区表，对同样的查询来讲，扫描的记录数量要比非分区表少很多，性能远比非分区表来的高效。

以下示例表 t1 为非分区表，对应的分区表为 p1 ，两张表有相同的纪录数，都为 1KW 条。

localhost:ytt> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `r1` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)


localhost:ytt> show create table p1\G
*************************** 1. row ***************************
       Table: p1
Create Table: CREATE TABLE `p1` (
  `id` int NOT NULL,
  `r1` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`id`)
(PARTITION p0 VALUES LESS THAN (1000000) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2000000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (3000000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (4000000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (5000000) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (6000000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (7000000) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (8000000) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (9000000) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0.00 sec)

localhost:ytt> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.94 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.92 sec)

我们来分别对两张表做范围检索，以下为执行计划：

localhost:ytt> explain format=tree select count(*) from t1 where id < 1000000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (t1.id < 1000000)  (cost=407495.19 rows=2030006)
        -> Index range scan on t1 using PRIMARY  (cost=407495.19 rows=2030006)

1 row in set (0.00 sec)

localhost:ytt> explain format=tree select count(*) from p1 where id < 1000000\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (p1.id < 1000000)  (cost=99980.09 rows=499369)
        -> Index range scan on p1 using PRIMARY  (cost=99980.09 rows=499369)

1 row in set (0.00 sec)

表 t1 对比表 p1 的执行计划，从成本，扫描记录数来讲，前者比后者多了几倍，明显分区表比非分区表性能来的更加高效。

再来看看对两张表做不等于检索的执行计划：

localhost:ytt> explain format=tree select count(*) from t1 where id != 2000001\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (t1.id <> 2000001)  (cost=1829866.58 rows=9117649)
        -> Index range scan on t1 using PRIMARY  (cost=1829866.58 rows=9117649)

1 row in set (0.00 sec)

localhost:ytt> explain format=tree select count(*) from p1 where id != 2000001\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Filter: (p1.id <> 2000001)  (cost=1002750.23 rows=4993691)
        -> Index range scan on p1 using PRIMARY  (cost=1002750.23 rows=4993691)

1 row in set (0.00 sec)

对于这样的低效率 SQL 来讲，从执行计划结果来看，分区表从成本、扫描记录数等均比非分区表有优势。

优化写入性能（带过滤条件的 UPDATE )。

对于这类更新请求，分区表同样要比非分区表来的高效。

下面为等值过滤的更新场景下，非分区表与分区表的执行计划对比：仅仅看扫描行数即可，分区表扫描记录数比非分区表要来的更少。

localhost:ytt> explain update t1 set r1 = date_sub(current_date,interval ceil(rand()*5000) day) where id between 1000001 and 2990000\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: t1
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 3938068
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

localhost:ytt> explain update p1 set r1 = date_sub(current_date,interval ceil(rand()*5000) day) where id between 1000001 and 2990000\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: p1
   partitions: p1,p2
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 998738
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

针对运维来讲：

分区表数据与非分区数据进行交换。

分区表的特定分区数据可以很方便的导出导入，能够快速的与非分区表数据进行交换。

创建一张表 t_p1 ，用来和表 p1 的分区 p1 交换数据。

localhost:ytt> create table t_p1 like t1;
Query OK, 0 rows affected (0.06 sec)

分区 p1 本身包含了 100W 行记录。使用分区表原生数据交换功能来交换数据，只花了 0.07 秒。

localhost:ytt> alter table p1 exchange partition p1 with table t_p1;
Query OK, 0 rows affected (0.07 sec)

查看交换后的数据，表 p1 少了 100W 行记录，分区 p1 被清空，表 t_p1 多了 100W 行记录。

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  9000000 |
+----------+
1 row in set (0.79 sec)

localhost:ytt> select count(*) from t_p1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.13 sec)

可以随时把数据交换回来，被交换的表清空。

localhost:ytt> alter table p1 exchange partition p1 with table t_p1;
Query OK, 0 rows affected (0.77 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.91 sec)

localhost:ytt> select count(*) from t_p1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

对比下非分区表的数据交换，步骤为：

选择需要交换的互换表。
从原始表选出数据导入到互换表。
删除原始表涉及到的数据。

如果此时需要把换出去的数据重新换入原始表，则需要以上步骤反着再来一遍，增加运维难度并且操作低效。

分区表置换还有一个最大的优点，就是比非分区表记录的日志量要小的多。我们来重新把上面的置换操作做一次。删除所有二进制日志。

localhost:ytt>reset master;

Query OK, 0 rows affected (0.02 sec)

做一次分区置换

localhost:ytt>alter table p1 exchange partition p2 with table t_p1;
Query OK, 0 rows affected (2.42 sec)

再次做置换删除表 t_p1 数据

localhost:ytt>alter table p1 exchange partition p2 with table t_p1;
Query OK, 0 rows affected (0.45 sec)

此时两次置换操作记录到二进制日志 ytt1.000001 里。

localhost:ytt>show master status;
...
 ytt1.000001 ： 47d6eda0-6468-11ea-a026-9cb6d0e27d15:1-2

重刷日志，非分区表置换记录。

localhost:ytt>flush logs;
Query OK, 0 rows affected (0.01 sec)


localhost:ytt>insert into t_p1 select * from p1 partition (p2) ;
Query OK, 934473 rows affected (5.25 sec)
Records: 934473  Duplicates: 0  Warnings: 0


localhost:ytt>show master status;
...
 ytt1.000002 ： 47d6eda0-6468-11ea-a026-9cb6d0e27d15:1-3

来看看具体的日志文件，ytt1.000001 只占了588个字节，而 ytt1.000002 记却要占用 7.2M 。

root@ytt-pc:/var/lib/mysql/3306# ls -sihl ytt1.00000*
2109882 4.0K -rw-r----- 1 mysql mysql  588 7月  23 11:13 ytt1.000001
2109868 7.2M -rw-r----- 1 mysql mysql 7.2M 7月  23 11:14 ytt1.000002

快速清理单个分区数据。

删除单个分区数据性能要优于非分区表删除某个范围内的数据。

比如，要清空分区表 p1 分区 p0 ，直接 truncate 单个分区。

localhost:ytt> alter table p1 truncate partition p0;
Query OK, 0 rows affected (0.07 sec)

localhost:ytt> select count(*) from p1;
+----------+
| count(*) |
+----------+
|  9000001 |
+----------+
1 row in set (0.92 sec)

非分区表只有 truncate 整张表的功能，所以无法对部分数据进行快速清理，只能根据过滤条件来 delete 数据，那这个性能就差了很多。同样的操作，比非分区表慢几十倍。

localhost:ytt> delete from t1 where id < 1000000;
Query OK, 999999 rows affected (26.80 sec)

总结：

MySQL 分区表在很多场景下使用非常高效，本篇介绍了分区表在简单检索与运维方面的基础优势，后续我们逐个来讨论更多场景下的分区表应用。

关于 MySQL 的技术内容，你们还有什么想知道的吗？赶紧留言告诉小编吧！

第37期：适当的使用 MySQL 原生表分区

于12月 13, 202112月 13, 2021由杨涛涛发布

分区表主要有以下几种优势：

接下来我们用实际例子来让前两种优势体现更新清晰。

针对检索来讲：

优化查询性能（范围查询）

优化写入性能（带过滤条件的 UPDATE )。

针对运维来讲：

分区表数据与非分区数据进行交换。

快速清理单个分区数据。

总结：

MySQL 调优

第47期：EXPLAIN TYPE 列的 JOIN 常见场景详解（上）

MySQL 调优

第46期：EXPLAIN：解说一条简单 SQL 语句的执行计划

MySQL 调优

第45期：一条 SQL 语句优化的基本思路

第37期：适当的使用 MySQL 原生表分区

于12月 13, 202112月 13, 2021由杨涛涛发布

分区表主要有以下几种优势：

接下来我们用实际例子来让前两种优势体现更新清晰。

针对检索来讲：

优化查询性能（范围查询）

优化写入性能（带过滤条件的 UPDATE )。

针对运维来讲：

分区表数据与非分区数据进行交换。

快速清理单个分区数据。

总结：

相关文章

MySQL 调优

第47期：EXPLAIN TYPE 列的 JOIN 常见场景详解（上）

MySQL 调优

第46期：EXPLAIN：解说一条简单 SQL 语句的执行计划

MySQL 调优

第45期：一条 SQL 语句优化的基本思路