问题描述
经常遇到 MySQL 因为表级锁导致的 SQL 阻塞问题,那么如何找到 MySQL 中被表级锁占用的线程和具体SQL内容?
问题分析
如果需要分析查看持有表级锁的线程及执行 SQL,需要开启 performance_schema 库并开启表级锁对应的instruments(wait/lock/metadata/sql/mdl),全局配置项 global_instrumentation 控制着 setup_consumers 的记录,所以也需要开启。
问题复现
Session 1:
mysql> begin;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into test values(2,2,2,'22',2);
Query OK, 1 row affected (0.00 sec)
mysql> select * from test;
+---------+------+-------+------+----------+
| auto_id | code | times | name | test_int |
+---------+------+-------+------+----------+
| 1 | 1 | 1 | 1 | NULL |
| 2 | 2 | 2 | 22 | 2 |
| 6 | 6 | 6 | 66 | NULL |
+---------+------+-------+------+----------+
3 rows in set (0.00 sec)
session 2:
mysql> alter table test drop column test_int;
session 3:
1. 查看当前 User 为 lxb 线程信息
mysql> SELECT * FROM information_schema.processlist where User='lxb';
+---------+------+---------------------+--------------------+---------+------+---------------------------------+----------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
+---------+------+---------------------+--------------------+---------+------+---------------------------------+----------------------------------------------------------------------+
| 4005295 | lxb | 192.168.0.128:52830 | lxb | Query | 437 | Waiting for table metadata lock | /*rdsproxy:lxb@192.168.0.128*/ alter table test drop column test_int |
| 4003183 | lxb | 192.168.0.128:52870 | information_schema | Query | 0 | executing | SELECT * FROM information_schema.processlist where User='lxb' |
| 4002776 | lxb | 192.168.0.128:52132 | lxb | Sleep | 498 | | NULL |
+---------+------+---------------------+--------------------+---------+------+---------------------------------+----------------------------------------------------------------------+
3 rows in set (0.00 sec)
可以看到alter table test drop column test_int
的状态是 Waiting for table metadata lock
2. 查看 test 表的 MDL 锁信息
mysql> select * from performance_schema.metadata_locks where OBJECT_TYPE='TABLE' and OBJECT_NAME='test'\G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: lxb
OBJECT_NAME: test
OBJECT_INSTANCE_BEGIN: 140015122787616
LOCK_TYPE: SHARED_WRITE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE:
OWNER_THREAD_ID: 4002796
OWNER_EVENT_ID: 28
*************************** 2. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: lxb
OBJECT_NAME: test
OBJECT_INSTANCE_BEGIN: 140015061993184
LOCK_TYPE: SHARED_UPGRADABLE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE:
OWNER_THREAD_ID: 4005315
OWNER_EVENT_ID: 7
*************************** 3. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: lxb
OBJECT_NAME: test
OBJECT_INSTANCE_BEGIN: 140014953141584
LOCK_TYPE: EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE:
OWNER_THREAD_ID: 4005315
OWNER_EVENT_ID: 7
3 rows in set (0.00 sec)
可以看到 OWNER_THREAD_ID为 4005315 的线程被 OWNER_THREAD_ID 为 4002796 的线程阻塞 处于 Pending 状态。
3.如何知道内部ID 4002796 对应的 process id是多少呢?
我们可以通过 performance_schema.threads 表查询
select processlist_id from performance_schema.threads where thread_id=4002796;
+----------------+
| processlist_id |
+----------------+
| 4002776 |
+----------------+
1 row in set (0.00 sec)
4. 排查具体的 SQL 语句
通过上述查询到的相关数据,4002776 的 processlist_id 对 sbtest1 表显式加了表级读锁,而且长时间处于 sleep 状态,但我们并不知道该线程正在执行什么 SQL 语句,我们可以通过 performance_schema.events_statements_current 表查询,如下:
# 查看当前事务正在执行的 SQL
mysql> select * from performance_schema.events_statements_current where thread_id=4002796\G;
*************************** 1. row ***************************
THREAD_ID: 4002796
EVENT_ID: 27
END_EVENT_ID: 27
EVENT_NAME: statement/sql/insert
SOURCE:
TIMER_START: 1696627893207823000
TIMER_END: 1696627893378879000
TIMER_WAIT: 171056000
LOCK_TIME: 57000000
SQL_TEXT: insert into test values(2,2,2,'22',2)
DIGEST: 8791a6928992bb852a2a054cb32a5437
DIGEST_TEXT: INSERT INTO `test` VALUES (...)
CURRENT_SCHEMA: lxb
OBJECT_TYPE: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: NULL
MYSQL_ERRNO: 0
RETURNED_SQLSTATE: 00000
MESSAGE_TEXT: NULL
ERRORS: 0
WARNINGS: 0
ROWS_AFFECTED: 1
ROWS_SENT: 0
ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
CREATED_TMP_TABLES: 0
SELECT_FULL_JOIN: 0
SELECT_FULL_RANGE_JOIN: 0
SELECT_RANGE: 0
SELECT_RANGE_CHECK: 0
SELECT_SCAN: 0
SORT_MERGE_PASSES: 0
SORT_RANGE: 0
SORT_ROWS: 0
SORT_SCAN: 0
NO_INDEX_USED: 0
NO_GOOD_INDEX_USED: 0
NESTING_EVENT_ID: NULL
NESTING_EVENT_TYPE: NULL
NESTING_EVENT_LEVEL: 0
解决方案
在确认了阻塞线程和具体的 SQL 后, 我们需要和业务确认是否可以手动杀掉任务,或者将其提交
mysql> kill 4002776
杀掉进程之后观察 session 1 中的数据
mysql> select * from test;
+---------+------+-------+------+
| auto_id | code | times | name |
+---------+------+-------+------+
| 1 | 1 | 1 | 1 |
| 6 | 6 | 6 | 66 |
+---------+------+-------+------+
2 rows in set (0.00 sec)
发现 auto_id 为2 的数据被回滚, drop 语句正常执行,test_int 字段被删除。
业务一定要确认清楚,才可以 kill 掉线程,线上请谨慎操作。
如果您有其他问题,欢迎您联系火山引擎技术支持服务