问题描述
使用 show full processlist 查看当前数据库正在运行的 SQL 语句,我们可能会发现 State 列为 Waiting for table metadata lock,我们该如何排查此类问题?
问题复现
- 在会话 1 中显式开启一个事务,执行一个 update 操作,不进行提交
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> update t1 set name='liwangz' where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
```
`
2. 在会话 2 中对表执行 DDL 操作:我们会发现操作被阻塞
````undefined
mysql> alter table t1 add index i_c(id);
```
`
3. 使用 `show full processlist` 命令进行排查,会发现 State 字段出现 **Waiting for table metadata lock**
![图片](https://lf3-volc-editor.volccdn.com/obj/volcfe/sop-public/upload_9b15372045d69fcfe68a398c5dfb0577.png)
# 解决方案
### 1. 前提条件
* 排查 MDL 锁依赖于 performance_schema,请提前确认 performance_schema 已经开启
* MDL 锁记录对应的 instruments 为**wait/lock/metadata/sql/mdl**,需要提前开启
### 2. 查看 metadata_lock 表获取锁信息
````undefined
mysql> select * from performance_schema.metadata_locks where OWNER_THREAD_ID!=sys.ps_thread_id(connection_id())\G;
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: rudonx
OBJECT_NAME: t1
OBJECT_INSTANCE_BEGIN: 139681262016240
LOCK_TYPE: SHARED_WRITE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:5993
OWNER_THREAD_ID: 48
OWNER_EVENT_ID: 282
*************************** 2. row ***************************
OBJECT_TYPE: GLOBAL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139681127989264
LOCK_TYPE: INTENTION_EXCLUSIVE
LOCK_DURATION: STATEMENT
LOCK_STATUS: GRANTED
SOURCE: sql_base.cc:5497
OWNER_THREAD_ID: 54
OWNER_EVENT_ID: 1064
*************************** 3. row ***************************
OBJECT_TYPE: SCHEMA
OBJECT_SCHEMA: rudonx
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139681137621328
LOCK_TYPE: INTENTION_EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_base.cc:5482
OWNER_THREAD_ID: 54
OWNER_EVENT_ID: 1064
*************************** 4. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: rudonx
OBJECT_NAME: t1
OBJECT_INSTANCE_BEGIN: 139681127961472
LOCK_TYPE: SHARED_UPGRADABLE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:5993
OWNER_THREAD_ID: 54
OWNER_EVENT_ID: 1066
*************************** 5. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: rudonx
OBJECT_NAME: t1
OBJECT_INSTANCE_BEGIN: 139681137618432
LOCK_TYPE: EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: mdl.cc:3891
OWNER_THREAD_ID: 54
OWNER_EVENT_ID: 1137
5 rows in set (0.00 sec)
```
`
输出中共有 5 行数据,其中我们可以发现第一行为 t1 表的SHARED_WRITE 锁,编号为 48 的线程所持有(GRANTED)
后续 4 行中体现了 t1 表的 SHARED_UPGRADABLE、EXCLUSIVE 锁,其中 SHARED_UPGRADABLE 处于 GRANTED 状态,EXCLUSIVE 处于 PENDING 状态,编号为 54 的线程在等待 MDL 锁。
### 3. 查询 events_statements_current 获取查询语句
从 performance_schema.events_statements_current 表来查询某个线程最后一次执行的语句:
````undefined
mysql> select * from performance_schema.events_statements_current where thread_id=48\G;
*************************** 1. row ***************************
THREAD_ID: 48
EVENT_ID: 403
END_EVENT_ID: 457
EVENT_NAME: statement/sql/update
SOURCE: socket_connection.cc:101
TIMER_START: 8819577691629633000
TIMER_END: 8819577691947950000
TIMER_WAIT: 318317000
LOCK_TIME: 160000000
SQL_TEXT: update t1 set name='liwangz' where id=1
DIGEST: 3c26d58f8e27a87167a1237254ff60c5
DIGEST_TEXT: UPDATE `t1` SET NAME = ? WHERE `id` = ?
CURRENT_SCHEMA: rudonx
OBJECT_TYPE: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: NULL
MYSQL_ERRNO: 0
RETURNED_SQLSTATE: 00000
MESSAGE_TEXT: Rows matched: 1 Changed: 0 Warnings: 0
ERRORS: 0
WARNINGS: 0
ROWS_AFFECTED: 0
ROWS_SENT: 0
ROWS_EXAMINED: 1
CREATED_TMP_DISK_TABLES: 0
CREATED_TMP_TABLES: 0
SELECT_FULL_JOIN: 0
SELECT_FULL_RANGE_JOIN: 0
SELECT_RANGE: 0
SELECT_RANGE_CHECK: 0
SELECT_SCAN: 0
SORT_MERGE_PASSES: 0
SORT_RANGE: 0
SORT_ROWS: 0
SORT_SCAN: 0
NO_INDEX_USED: 0
NO_GOOD_INDEX_USED: 0
NESTING_EVENT_ID: NULL
NESTING_EVENT_TYPE: NULL
NESTING_EVENT_LEVEL: 0
1 row in set (0.00 sec)
ERROR:
No query specified
```
`
通过上述 SQL 回显的信息,我们可以看到相关 SQL 文本的信息。如果我们单纯去查询 `show full processlist`,不会显示 SQL 文本,会给定位问题带来一定的困扰。同样的去查询 `information_schema.innodb_trx\G`,我们只能看到有一个未提交的事务,也不会有文本信息。
````undefined
mysql> select * from information_schema.innodb_trx\G;
*************************** 1. row ***************************
trx_id: 36257
trx_state: RUNNING
trx_started: 2021-07-06 14:35:27
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 3
trx_mysql_thread_id: 22
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 1
trx_lock_structs: 2
trx_lock_memory_bytes: 1136
trx_rows_locked: 1
trx_rows_modified: 1
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
ERROR:
No query specified
```
`
### 4. kill 掉相关线程
通过 `performance_schema.events_statements_current` 表中的 THREAD_ID 字段值来获取 processlist ID,相关语句如下:
````undefined
mysql> select THREAD_ID,PROCESSLIST_ID from performance_schema.threads where THREAD_ID=48;
+-----------+----------------+
| THREAD_ID | PROCESSLIST_ID |
+-----------+----------------+
| 48 | 22 |
+-----------+----------------+
1 row in set (0.03 sec)
```
`
当和业务确认之后,可以执行如下操作来 kill 掉相关连接,将锁释放。
````undefined
mysql> kill 22;
Query OK, 0 rows affected (0.06 sec)
```
`
上述所有 SQL 语句的排查目的在于发现到底是什么语句持有 MDL 锁而引起阻塞,如果想快速恢复业务,可以使用如下方式:
````undefined
SELECT
concat( 'kill ', p1.id, ';' )
FROM
information_schema.PROCESSLIST p1,
(
SELECT
id,
time
FROM
information_schema.PROCESSLIST
WHERE
time = ( SELECT max( time ) FROM information_schema.PROCESSLIST WHERE state = 'Waiting for table metadata lock'
)) p2
WHERE
p1.time >= p2.time
AND p1.command IN ( 'Sleep', 'Query' )
AND p1.id NOT IN ( connection_id(), p2.id );
```
`
# 参考文档
[1] [https://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html](https://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html)
**如果您有其他问题,欢迎您联系火山引擎**[技术支持服务](https://console.volcengine.com/ticket/createTicketV2/)