如何排查与分析MySQL死锁问题 - 文章 - 开发者社区

问题描述
在真实的生产环境中，死锁问题是非常常见的，我们应该如何排查此类问题并进行规避？
问题分析
表结构与测试数据：
CREATE TABLE `dead_lock` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `kcol1` (`col1`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8

mysql> select * from dead_lock;
+----+------+
| id | col1 |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.03 sec)
```
`
事务的执行顺序如下：

| **步骤** | **transaction_1** | **transaction_2** |
| --- | --- | --- |
| 1 |  | begin; |
| 2 |  | delete from dead_lock where col1=2; |
| 3 | begin； |  |
| 4 | delete from dead_lock where col1=2; 出现等待 |  |
| 5 |  | insert into dead_lock(col1) values(2); |
| 6 | ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction |  |
| 7 | thread_id=3301099 | thread_id= 3300738 |

使用 **show engine innodb status** 打出最近一次死锁信息：
```sql
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-02-10 16:31:21 0x7fe759f2f700
*** (1) TRANSACTION:
TRANSACTION 22150690, ACTIVE 34 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 3301099, OS thread handle 140632233637632, query id 169406906 52.82.38.248 52.82.38.248 rudonx updating
delete from dead_lock where col1=2
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 303 page no 4 n bits 72 index kcol1 of table `rudonx`.`dead_lock` trx id 22150690 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 4; hex 80000002; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 22150533, ACTIVE 113 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2
MySQL thread id 3300738, OS thread handle 140631623268096, query id 169409104 52.82.38.248 52.82.38.248 rudonx update
insert into dead_lock(col1) values(2)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 303 page no 4 n bits 72 index kcol1 of table `rudonx`.`dead_lock` trx id 22150533 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 4; hex 80000002; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 303 page no 4 n bits 72 index kcol1 of table `rudonx`.`dead_lock` trx id 22150533 lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 4; hex 80000002; asc     ;;

*** WE ROLL BACK TRANSACTION (1)**

```

从上面的输出中，我们可以得出如下信息：

1. transaction 2 ( thread_id =3300738 ) 持有的锁为 lock_mode X locks rec but not gap。
2. transaction 1 (thread_id = 3301099 ) 请求的锁为 lock_mode X locks rec。
3. 同时transaction 2 ( thread_id =3300738 )请求的锁为 lock mode S waiting Record lock。
4. 出现死锁之后，MySQL 选择回滚掉transaction 1，因为回滚这个事务的代价是最小的。

# 解决方案
死锁产生会有如下条件：

1. 两个或者两个以上的事务。
2. 每个事务已经持有锁，并希望申请新的锁。
3. 事务与事务由于持有锁和申请锁导致形成循环等待。

从这些条件入手，我们会得出一些解决死锁问题的办法。这里有一些建议，供您参考：

1. 事务要及时提交，避免长事务，复杂事务。
2. 调整优化业务逻辑，优化事务加锁的顺序。
3. 事务尽量要小，事务时间变长则导致事务中锁的持有时间变长，影响整体的数据库吞吐量。

同时，还有如下最佳实践供您参考：

1. 可以使用比较低的隔离级别，RC 隔离级别在锁冲突方面优于RR，避免RR模式带来的gap锁竞争。
2. 为表添加合适的索引，避免全表扫描。
3. 开启 **innodb_print_all_deadlocks**，将死锁日志写入到MySQL error log中，以便后续的分析。
4. 开启 **innodb_status_output_locks** 获详细的事务和死锁信息。

**如果您有其他问题，欢迎您联系火山引擎**[技术支持服务](https://console.volcengine.com/ticket/createTicketV2/)