问题现象
用户上传 IK 词典文件后,在启用时报错:
elasticsearch cluster status not ready, no update or restart will be executed. If you want to update or restart this resource anyway, please FORCE to do it.
排查步骤
根据此报错描述,检查云搜索集群是否处于重启,以及 RED 或者 Yellow 状态
- 检查集群健康状态
{
"cluster_name" : "nkxzzdr1xxxxx",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 6,
"active_shards" : 16,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 3,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 84.21052631578947
}
可以看到当前实例状态为 Yellow,继续排查集群处于 Yellow 状态的原因,运行如下命令,命令回显显示集群中有三个副本分片处于未分配状态。
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason&v
index shard prirep state unassigned.reason
mytest 2 p STARTED
mytest 2 r STARTED
mytest 2 r STARTED
mytest 2 r UNASSIGNED INDEX_CREATED
mytest 1 r STARTED
mytest 1 r STARTED
mytest 1 p STARTED
mytest 1 r UNASSIGNED INDEX_CREATED
mytest 0 r STARTED
mytest 0 p STARTED
mytest 0 r STARTED
mytest 0 r UNASSIGNED INDEX_CREATED
查看未分片的具体解释
GET _cluster/allocation/explain?pretty
// 部分回显如下:
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "a copy of this shard is already allocated to this node [[mytest][1], node[aXDggTa2REqsvQxOEmhBUg], [R], s[STARTED], a[id=-NcCEWV0Swaj8fu_K0eXFA]]"
}
解决方案
根据上述的排查信息,我们可以找到对应的处理方法,在这个案例中,通过修改副本分片的个数,或者是增加一个节点解决此问题。 这里我们选择修改索引的副本分片数来使集群恢复到 Green 状态
PUT mytest/_settings
{
"number_of_replicas": 2
}
当集群恢复到 Green 后,再次对词典文件进行启用即可成功。 对词典文件进行启用,会导致集群进行重启,建议您在业务可维护窗口进行相关操作。
参考文档