ElasticSearch集群状态异常(Red、Yellow)原因分析

注：部分概念介绍来源于网络

一、ElasticSearch集群的三种状态：
Green - 所有数据都可用，主副分片都已经分配好
Yellow - 所有数据都可用，但尚未分配一些副本，不影响查询，可能影响恢复。如果集群中的某个节点发生故障，则在修复该节点之前，某些数据可能不可用。
Red - 某些数据由于某种原因存在主分片未分配，对查询会有影响

二、查询索引Yellow状态原因
1、查看集群的健康并显示索引状态

GET /_cluster/health?level=indices
{"cluster_name" : "elasticsearch-1","status" : "green","timed_out" : false,"number_of_nodes" : 3,"number_of_data_nodes" : 3,#活动主分区数量"active_primary_shards" : 28,#活动主分区和副本分区的总数"active_shards" : 55,#正在重定位的分片数量"relocating_shards" : 0,#正在初始化的分片数量"initializing_shards" : 0,#未分配的分片数"unassigned_shards" : 3,#其分配因超时设置而延迟的分片数"delayed_unassigned_shards" : 0,#尚未执行的集群级别更改的数量"number_of_pending_tasks" : 0,#为完成的访问数量"number_of_in_flight_fetch" : 0,#自最早的初始化任务等待执行以来的时间(以毫秒为单位)"task_max_waiting_in_queue_millis" : 0,#集群中活动碎片的比率，以百分比表示"active_shards_percent_as_number" : 100.0,"indices" : {"elasticsearch-1" : {"status" : "green","number_of_shards" : 3,"number_of_replicas" : 3,"active_primary_shards" : 5,"active_shards" : 10,"relocating_shards" : 0,"initializing_shards" : 0,"unassigned_shards" : 3}}
}

2、查看集群中每个节点的分片分配情况

GET /_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host      ip        node19       86.7kb    36.9gb     95.2gb    132.2gb           27 127.0.0.1 127.0.0.1 master18       73.1kb    36.9gb     95.2gb    132.2gb           27 127.0.0.1 127.0.0.1 node-00318       67.8kb    36.9gb     95.2gb    132.2gb           27 127.0.0.1 127.0.0.1 node-0023                                                                               UNASSIGNED
#unassigned_shards=3，确定是副本分片未分配，导致集群状态Yellow

3、查看unassigned的原因

GET /_cluster/allocation/explain?pretty
{"index" : "elasticsearch-1","shard" : 3,"primary" : false,"current_state" : "unassigned","unassigned_info" : {"reason" : "CLUSTER_RECOVERED","at" : "2022-04-20T11:01:43.051Z","last_allocation_status" : "no_attempt"},"can_allocate" : "no",#异常原因"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes","node_allocation_decisions" : [{"node_id" : "NfmBH4nSSpGmtf7aPNuvXQ","node_name" : "master","transport_address" : "127.0.0.1:9300","node_decision" : "no","deciders" : [{"decider" : "same_shard","decision" : "NO","explanation" : "the same cannot be allocate to the same node no which a copy of the shard already exists "}]}]
}

查看每个节点原因说有同样的数据，不能分配。
4、查看所有的分片

GET _cat/shards?h=index,shard,prirep,state,unassigned.reason

5、修改索引副本数

PUT /elasticsearch-1/_settings
{"number_of_replicas": 2
}

6、更改完后查询

GET /_cluster/health?level=indices"unassigned_shards" : 0

三、总结(Red、Yellow)
遇到集群Red、Yellow时，我们可以从如下方法排查 :

集群层面：curl -s 172.31.30.28:9200/_cat/nodes 或者 GET /_cluster/health
索引层面：GET /_cluster/health?pretty&level=indices
分片层面：GET /_cluster/health?pretty&level=shards
恢复情况：GET /_recovery?pretty

1、有unassigned分片的排查思路：

先诊断：GET /_cluster/allocation/explain
#重新分配： /_cluster/reroute
实在无法分配，索引重建：
1.1、新建备份索引：
curl -XPUT ‘http://xxxx:9200/a_index_copy/‘ -d ‘{ “settings”:{ “index”:{ “number_of_shards”:3, “number_of_replicas”:1 } } }
1.2、通过reindex api将a_index数据copy到a_index_copy：
POST _reindex { "source": { "index": "a_index" }, "dest": { "index": "a_index_copy", "op_type": "create" } }
1.3、删除a_index索引，这个必须要先做，否则别名无法添加
curl -XDELETE 'http://xxxx:9200/a_index'
1.4、给a_index_copy添加别名a_index
curl -XPOST 'http://xxxx:9200/_aliases' -d ' { "actions": [ {"add": {"index": "a_index_copy", "alias": "a_index"}} ] }'

ElasticSearch集群状态异常(Red、Yellow)原因分析相关推荐

Elasticsearch集群索引分片未分配unassigned导致集群状态红色red异常若干问题解决记录
一.背景今天收到告警短信发现某个业务系统几台服务器的日志收集服务filebeat均已下线,把Filebeat都重启之后,准备到Kibana查阅业务系统的日志,检索异常条目,发现从凌晨开始的日志条目都 ...
锦囊妙计解决elasticsearch集群为red状态
如何是自己搭建的elasticsearch集群,其实是比较容易发生丢失分片的情况的. 1. 如果集群丢失了主分片则直接呈现红色的健康状态严重的会影响到对集群的写入,因为如果主分片丢了,但是集群的m ...
Elasticsearch 集群状态变成黄色或者红色，怎么办？
1.引言本系列文章介绍如何修复 Elasticsearch 集群的常见错误和问题. 这是系列文章的第六篇,主要探讨:Elasticsearch 集群状态变成黄色或者红色,怎么办? 第一篇:Elast ...
Java常见异常类型及原因分析
Java常见异常类型及原因分析 0x1 NullPointerException异常顾名思义,NullPointerException 是空指针异常.但是在 Java 中没有指针,怎么会有空指针异 ...
华大 MCU 之七 DMA 导致 SPI 异常停止的原因分析、DMA 配置的那些坑
缘起在最近的项目测试中发现,SPI 通信总是莫名其妙的失败,查看寄存器发现 SPI 已经被停止了.根据手册,SPI 在异常情况下会被强制停止(SPI 的使能为被清零),而根据波形显示通信过程没有 ...
C++软件异常的常见原因分析与总结（实战经验分享）
目录 1.概述 2.引发软件异常的常见原因 2.1.变量未初始化 2.2.死循环 2.3.内存越界 2.4.内存泄漏 2.5.空指针与野指针 2.6.内存访问违例 2.7.栈内存被当成堆内存去释放 2 ...
如何监控 Elasticsearch 集群状态？
Marvel 让你可以很简单的通过 Kibana 监控 Elasticsearch.你可以实时查看你的集群健康状态和性能,也可以分析过去的集群.索引和节点指标.
Java 常见异常类型及其原因分析
开发工具与关键技术:Java 作者:吴永旗撰写时间:2019年5月22日常见几种的异常有:NullpointerException异常:classCastException异常: ArrayInd ...
kubernetes 集群状态异常 [ connect: connection refused ]
获取kubectl状态有报错信息: Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: ...
Elasticsearch集群的搭建与管理
主机规划: 192.168.0.1(node1) 192.168.0.2(node2) 部署node1 node1配置如下: 下载https://artifacts.elastic.co/downlo ...

ElasticSearch集群状态异常(Red、Yellow)原因分析

ElasticSearch集群状态异常(Red、Yellow)原因分析相关推荐

最新文章

热门文章