第21期：索引设计（函数索引）

本篇主要介绍 MySQL 的函数索引（也叫表达式索引）。

通常来讲，索引都是基于字段本身或者字段前缀（第 20 篇），而函数索引是基于字段本身加上函数、操作符、表达式等计算而来。如果将表达式或者操作符也看做函数的话，简单来说，这样的索引就可以统称函数索引。

MySQL 的函数索引内部是基于虚拟列（generated columns）实现，不同于直接定义虚拟列，函数索引自动创建的虚拟列本身实时计算结果，并不存储数据，只把函数索引本身存在磁盘上。

MySQL 8.0.13 之前不支持函数索引，所以老版本包括现在主流的 MySQL 5.7 也不支持函数索引，需要手工模拟创建或者改 SQL。

本章基于以下几点来讲函数索引：

1. 函数索引的使用场景

函数索引最最经典的使用场景莫过于就是对日期的处理，特别是表中只定义了一个字段，后期对这个字段的查询都是基于部分结果。比如 “2100-02-02 08:09:09.123972” 包含了日期 “2100-02-02”，时间 “08:09:09”，小数位时间 “123972”，有可能会对这个值拆解后部分查询。

举个简单例子，表 t1 有两个字段，一个主键，另外一个时间字段，总记录数不到 40W。

<localhost|mysql>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `log_time` datetime(6) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_log_time` (`log_time`)
) ENGINE=InnoDB AUTO_INCREMENT=524268 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)


<localhost|mysql>select count(*) from t1;
+----------+
| count(*) |
+----------+
|   393216 |
+----------+
1 row in set (0.07 sec)

执行下面这条 SQL 1，把日期单独拿出来，执行了 0.09 秒。

# SQL 1
<localhost|mysql>select * from t1 where date(log_time) = '2100-02-02';
+--------+----------------------------+
| id     | log_time                   |
+--------+----------------------------+
| 524267 | 2100-02-02 08:09:09.123972 |
+--------+----------------------------+
1 row in set (0.09 sec)

看下它的执行计划，虽然走了索引，但是扫描行数为总记录数，相当于全表扫，这时候比全表扫还不理想，全表扫直接走聚簇索引还快点。

<localhost|mysql>explain select * from t1 where date(log_time) = '2100-02-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index
possible_keys: NULL
          key: idx_log_time
      key_len: 9
          ref: NULL
         rows: 392413
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

这时最好的方法就是为列 log_time 加一新索引，基于函数 date 的函数索引。

<localhost|mysql>alter table t1 add key idx_func_index_1((date(log_time)));
Query OK, 0 rows affected (2.76 sec)
Records: 0  Duplicates: 0  Warnings: 0

再次执行上面的 SQL 1，瞬间执行完毕。

<localhost|mysql>select * from t1 where date(log_time) = '2100-02-02';
+--------+----------------------------+
| id     | log_time                   |
+--------+----------------------------+
| 524267 | 2100-02-02 08:09:09.123972 |
+--------+----------------------------+
1 row in set (0.00 sec)

接下来查看执行计划，结果显示走函数索引 idx_func_index_1 扫描记录数只有一行，执行计划达到最优。

<localhost|mysql>explain select * from t1 where date(log_time) = '2100-02-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_func_index_1
          key: idx_func_index_1
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

如果想查看 MySQL 函数索引内部创建的列，直接 show create table 看是没有结果的，比如下面只看到一个新的索引。

<localhost|mysql>show create table t1\G
...
  KEY `idx_func_index_1` ((cast(`log_time` as date)))
) ENGINE=InnoDB AUTO_INCREMENT=524268 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

通过 MySQL 8.0 的新语句 show extended columns 查看隐藏的列，下面结果发现确实是新加了一个虚拟列。

<localhost|mysql>show extended columns from t1;
...
| bbd3daff935e7a4d0991a8393ec03728 | date            | YES  | MUL | NULL    | VIRTUAL GENERATED |
...
5 rows in set (0.03 sec)

2. 函数索引在处理 JSON 类型的注意事项

比如需要遍历 JSON 类型的子串作为索引，直接用遍历操作符 ->> 报错。

<localhost|mysql>create table t2 (id int primary key, r1 json);
Query OK, 0 rows affected (0.09 sec)

<localhost|mysql>alter table t2 add key idx_func_index_2((r1->>'$.x'));
ERROR 3757 (HY000): Cannot create a functional index on an expression that returns a BLOB or TEXT. Please consider using CAST.

操作符 ->> 表示从 JSON 串中遍历指定路径的 value，在 MySQL 内部转换为 json_unquote(jso_extract(…))，而函数 json_unquote 返回结果具有以下特性：

数据类型为 longtext，在 MySQL 里 longtext 类型只支持前缀索引，必须用函数 cast 来转换类型。
json_unquote 调用结果的排序规则为 utf8mb4_bin，cast 调用结果的排序规则和当前 session 的排序规则一样，有可能不是 utf8mb4_bin，所以函数索引中要显式定义排序规则。

所以针对 JSON 字段来建立新的函数索引：

<localhost|mysql>alter table t2 add key idx_func_index_2((cast(r1->>'$.x' as char(1)) collate utf8mb4_bin));
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

看下表结构，操作符 ->> 被转换为 json_unquote(json_extract(…))，并且排序规则为 utf8mb4_bin。


<localhost|mysql>show create table t2\G
*************************** 1. row ***************************
      Table: t2
...
 KEY `idx_func_index_2` (((cast(json_unquote(json_extract(`r1`,_utf8mb4'$.x')) as char(1) charset utf8mb4) collate utf8mb4_bin)))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

接下来插入几条记录，看看这个函数索引的使用。

<localhost|mysql>select * from t2;
+----+---------------------+
| id | r1                  |
+----+---------------------+
|  1 | {"x": "1", "y": 10} |
|  2 | {"x": "2", "y": 20} |
|  3 | {"x": "a", "y": 20} |
|  4 | {"x": "A", "y": 20} |
+----+---------------------+
4 rows in set (0.00 sec)

执行下 SQL 2，并且看下执行计划，直接走了刚才创建的函数索引。

# SQL 2
<localhost|mysql>select * from t2 where r1->>'$.x'='a';
+----+---------------------+
| id | r1                  |
+----+---------------------+
|  3 | {"x": "a", "y": 20} |
+----+---------------------+
1 row in set (0.00 sec)

<localhost|mysql>explain select * from t2 where r1->>'$.x'='a'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t2
  partitions: NULL
        type: ref
possible_keys: idx_func_index_2
         key: idx_func_index_2
     key_len: 7
         ref: const
        rows: 1
    filtered: 100.00
       Extra: NULL
1 row in set, 1 warning (0.00 sec)

这里其实应该有个疑问，对函数索引的调用，必须要按照之前定义好的函数来执行，否则不会用到索引，那 SQL 2 怎么可以直接到用索引？

MySQL 在这块儿其实内部已经转换为正确的语句。查看下刚才 EXPLAIN 的 WARNINGS 信息。可以看到 SQL 2 被 MySQL 转换为遵守函数索引规则的正确语句。

<localhost|mysql>show warnings\G
*************************** 1. row ***************************
 Level: Note
  Code: 1003
Message: /* select#1 */ select `ytt`.`t2`.`id` AS `id`,`ytt`.`t2`.`r1` AS `r1` from `ytt`.`t2` where ((cast(json_unquote(json_extract(`ytt`.`t2`.`r1`,_utf8mb4'$.x')) as char(1) charset utf8mb4) collate utf8mb4_bin) = 'a')
1 row in set (0.00 sec)

3. 函数索引替代前缀索引？

之前讲过前缀索引，可能会有这样的疑问。前缀索引能不能被函数索引替代？当然是不行的！函数索引要求查询条件严格按照函数索引的定义匹配，虽然有的场景下 MySQL 可以内部转换，但是 MySQL 无法为每个函数都替换为最优化的写法。比如函数 substring,left,right 等。

下面例子用来模拟下是否可以用函数索引替代前缀索引。示例表 t3，一个前缀索引和两个函数索引实现的目的一样，但是实际查询的时候 SQL 语句并不一样。


<localhost|mysql>show create table t3\G
*************************** 1. row ***************************
      Table: t3
Create Table: CREATE TABLE `t3` (
 `id` bigint unsigned NOT NULL AUTO_INCREMENT,
 `r1` char(36) DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 KEY `idx_r1_prefix` (`r1`(8)),
 KEY `idx_func_index_3` ((left(`r1`,8))),
 KEY `idx_func_index_4` ((substr(`r1`,1,8)))
) ENGINE=InnoDB AUTO_INCREMENT=249 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

以下 SQL 3 、SQL 4、SQL 5 写法不一样，查询结果一样，走的索引不一样。

# SQL 3
select * from t3 where r1 like 'de45c7d9%';

# SQL 4
select * from t3 where left(r1,8) ='de45c7d9';

# SQL 5
select * from t3 where substring(r1,1,8) ='de45c7d9';

<localhost|mysql>select * from t3 where r1 like 'de45c7d9%';
+-----+--------------------------------------+
| id  | r1                                   |
+-----+--------------------------------------+
| 178 | de45c7d9-935c-11ea-8421-08002753f58d |
+-----+--------------------------------------+
1 row in set (0.00 sec)

<localhost|mysql>select * from t3 where left(r1,8) ='de45c7d9';
+-----+--------------------------------------+
| id  | r1                                   |
+-----+--------------------------------------+
| 178 | de45c7d9-935c-11ea-8421-08002753f58d |
+-----+--------------------------------------+
1 row in set (0.00 sec)

<localhost|mysql>select * from t3 where substring(r1,1,8) ='de45c7d9';
+-----+--------------------------------------+
| id  | r1                                   |
+-----+--------------------------------------+
| 178 | de45c7d9-935c-11ea-8421-08002753f58d |
+-----+--------------------------------------+
1 row in set (0.00 sec)

各自的查询计划，每条 SQL 走的不同的索引。

<localhost|mysql>explain select * from t3 where r1 like 'de45c7d9%'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: range
possible_keys: idx_r1_prefix
         key: idx_r1_prefix
     key_len: 33
         ref: NULL
        rows: 1
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

<localhost|mysql>explain select * from t3 where left(r1,8) ='de45c7d9'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: ref
possible_keys: idx_func_index_3
         key: idx_func_index_3
     key_len: 35
         ref: const
        rows: 1
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

<localhost|mysql>explain select * from t3 where substring(r1,1,8) ='de45c7d9'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: ref
possible_keys: idx_func_index_4
         key: idx_func_index_4
     key_len: 35
         ref: const
        rows: 1
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

此时删除掉函数索引 idx_func_index_3, SQL 4 就无法走正确的索引。

<localhost|mysql>alter table t3 drop key idx_func_index_3;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

<localhost|mysql>explain select * from t3 where left(r1,8) ='de45c7d9'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 128
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

查看 warnings，发现 MySQL 优化器转换后的 SQL，LEFT 函数还是保持原样，但是表里没有基于 LEFT 函数的索引，只能全表扫。

<localhost|mysql>show warnings\G
*************************** 1. row ***************************
 Level: Note
  Code: 1003
Message: /* select#1 */ select `ytt`.`t3`.`id` AS `id`,`ytt`.`t3`.`r1` AS `r1` from `ytt`.`t3` where (left(`ytt`.`t3`.`r1`,8) = 'de45c7d9')
1 row in set (0.00 sec)

4. 老版本如何实现函数索引

函数索引是 MySQL 8.0.13 才有的。那在老的版本如何实现呢？

MySQL 5.7 自持虚拟列，只需要在虚拟列上创建一个普通索引就行。

MySQL 5.6 以及 MySQL 5.5 等，则需要自己定义一个冗余列，然后定期更新这列内容。当然最核心的是如何规划好定期更新内容这块。这块如果讨论起来，内容非常多，可以参考我之前写的关于表样例数据更新收集这块内容，MySQL 内部的做法。

文章推荐：

第20期：索引设计（前缀索引）

第19期：索引设计（哈希索引数据分布与使用场景）

第18期：索引设计（认识哈希表）

第21期：索引设计（函数索引）

于2月 3, 2021由杨涛涛发布

1. 函数索引的使用场景

2. 函数索引在处理 JSON 类型的注意事项

3. 函数索引替代前缀索引？

4. 老版本如何实现函数索引

MySQL 调优

第47期：EXPLAIN TYPE 列的 JOIN 常见场景详解（上）

MySQL 调优

第46期：EXPLAIN：解说一条简单 SQL 语句的执行计划

MySQL 调优

第45期：一条 SQL 语句优化的基本思路

第21期：索引设计（函数索引）

于2月 3, 2021由杨涛涛发布

1. 函数索引的使用场景

2. 函数索引在处理 JSON 类型的注意事项

3. 函数索引替代前缀索引？

4. 老版本如何实现函数索引

相关文章

MySQL 调优

第47期：EXPLAIN TYPE 列的 JOIN 常见场景详解（上）

MySQL 调优

第46期：EXPLAIN：解说一条简单 SQL 语句的执行计划

MySQL 调优

第45期：一条 SQL 语句优化的基本思路