一 故障描述

9月22日,全国kafka集群中的其中一台kafka因磁盘空间不足宕机后,业务会受到影响,无法生产与消费消息。程序报错:

WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

二 故障模拟

2.1 topic分区的replicas为1时情形

#生产消息

[root@Centos7-Mode-V8 kafka]# bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

>aa

>bb

#消费消息:

[root@Centos7-Mode-V8 kafka]#  bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

aa

bb

2.1.1 模拟关掉该topic所属leader节点

#用kafka tool查看该topic的分区的leader在哪个节点上

/*

用kafka命令也可以看

bin/kafka-topics.sh --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe

结果输出如下:

*/

关掉其leader节点,发现生产者和所有消费者进程都一直在刷如下信息:

[2021-09-23 17:09:53,495] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

无法发送消息,也无法消费消息。

2.1.2 模拟关掉非leader节点

有时消费者进程会报错:[2021-09-23 17:21:22,480] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] Connection to node 2147483645 (/192.168.144.253:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

报错期间可以正常生产消息,但无法消费这中间产生的数据。

2.1.3 总结

在分区replicats等于1的情况下,停掉任意一个节点,都会影响业务。

其中,当某个分区leader所在节点宕机,会影响生产消息与消费消息。

当非leader节点宕机,会影响消费消息。

2.2 分区有多个副本情形

分区在无其他副本情况下,影响业务可以理解,因此尝试为topic配置多个副本,发现竟然还是影响业务:

#创建一个拥有三副本的topic

bin/kafka-topics.sh --create --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --replication-factor 3 --partitions 1 --topic song

#查看副本信息

[root@Centos7-Mode-V8 kafka]# bin/kafka-topics.sh  --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe --topic song

Topic:song PartitionCount:1 ReplicationFactor:3 Configs:

Topic: song Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1

#发消息

bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song

#消费进程1

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g1

#消费进程2

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g2

#模拟关掉该topic所属leader节点

发现还能生产消息,没有报1 partitions have leader brokers without a matching listener错了,但是发现消费者在连不上topic leader后,有时报错:

[2021-09-24 19:01:06,316] WARN [Consumer clientId=consumer-1, groupId=console-consumer-27609] Connection to node 2147483647 (/192.168.144.247:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

这期间生产的数据有时没有过来,无法消费节点故障期间产生的消息。

只是为什么有了多个副本之后节点宕机还是会丢消息呢?

答:__consumer_offsets只有1个副本,会导致即使拥有多个副本的topic也无法实现高可用。

#后来通过扩kafka自带的这个topic(__consumer_offsets)的副本,可以实现其他普通topic的高可用了,虽然停掉某个节点后,还是报Broker may not be available,但是不再影响业务了。

三 故障定位

Kafka配置文件中没配置default.replication.factor=3,而该参数默认为1,表示没有其他副本,因此相当于是单点。

四 解决办法

4.1 修改default.replication.factor参数

修改所有kafka节点配置文件,调大topic的默认副本因子(该参数默认为1):

default.replication.factor=3

设置了default.replication.factor=3,offsets.topic.replication.factor也会默认为3。

注意,不要设置了default.replication.factor=3,又设置offsets.topic.replication.factor=1,这样offsets.topic.replication.factor的值会覆盖default.replication.factor的值。

#重启kafka,使配置生效

systemctl restart kafka

4.2 为现有普通topic扩副本

可参考https://blog.csdn.net/yabingshi_tech/article/details/120443647

4.3 为__consumer_offset扩副本

方法同上,json文件如下:

{"version": 1, "partitions": [{"topic": "__consumer_offsets", "partition": 0, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 1, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 2, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 3, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 4, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 5, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 6, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 7, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 8, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 9, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 10, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 11, "replicas": [0, 1, 2]},{"topic": "__consumer_offsets", "partition": 12, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 13, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 14, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 15, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 16, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 17, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 18, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 19, "replicas": [0, 1, 2                ]},{"topic": "__consumer_offsets", "partition": 20, "replicas": [0, 1, 2                ]},{"topic": "__consumer_offsets", "partition": 21, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 22, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 23, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 24, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 25, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 26, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 27, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 28, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 29, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 30, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 31, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 32, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 33, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 34, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 35, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 36, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 37, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 38, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 39, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 40, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 41, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 42, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 43, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 44, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 45, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 46, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 47, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 48, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 49, "replicas": [2, 0, 1 ]}]
}

--本篇文章参考了:Kafka突然宕机了?稳住,莫慌!

1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.相关推荐

  1. kafka服务器报错1 partitions have leader brokers without a matching listener, including [topic_log-0]

    服务器部分报错信息截图 2022-06-11 17:18:18,140 (PollableSourceRunner-KafkaSource-r1) [WARN - org.apache.kafka.c ...

  2. 【kafka】连接kafka报错 partitions have leader brokers without a matching listener

    1.概述 一个正常的kafka消费者,开始正常,后来报错 partitions have leader brokers without a matching listener WARN [tag-se ...

  3. 连接kafka报错:1 partitions have leader brokers without a matching listener

    服务输出部分错误日志截图 2020/12/25 下午2:32:442020-12-25 14:32:44.320 WARN [tag-service,,,] 1 --- [ntainer#4-0-C- ...

  4. WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers

    一 问题描述 同事反馈我们的三节点kafka集群当其中一台服务器宕机后,业务受到影响,无法生产与消费消息.程序报错: WARN [Consumer clientId=consumer-1, group ...

  5. 【Kafka】测试集群中Broker故障对客户端的影响

    本文主要测试Kafka集群中Broker节点故障对客户端的影响. 集群信息:4个broker.topic:100+(每个topic30个partition).集群加密方式:plaintext.存储:c ...

  6. OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions

    问题描述: OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions ...

  7. Kafka深度解析(如何在producer中指定partition)(转)

    原文链接:Kafka深度解析 背景介绍 Kafka简介 Kafka是一种分布式的,基于发布/订阅的消息系统.主要设计目标如下: 以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能 ...

  8. Docker安装Kafka(docker-compose.yml)

    Docker安装Kafka(docker-compose.yml) 前置条件 请先安装Docker 创建docker-compose.yml文件 version: '2' services:zooke ...

  9. kafka redis vs 发布订阅_发布订阅的消息系统 Kafka的深度解析

    背景介绍 Kafka简介 Kafka是一种分布式的,基于发布/订阅的消息系统.主要设计目标如下: 以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能保证常数时间的访问性能 高吞吐 ...

最新文章

  1. C 语言回顾,数组指针的使用(小鸡肋的使用)
  2. 数据库使用--MySQL: InnoDB 还是 MyISAM?
  3. 第二百二十六天 how can I 坚持
  4. UA MATH567 高维统计专题1 稀疏信号及其恢复5 LASSO的估计误差
  5. vue 如何解析原生html,VUE渲染后端返回含有script标签的html字符串示例
  6. 基于linux智能家居系统设计,基于Linux的智能家居的设计(2)
  7. EasyUI datagrid : 启用行号、固定列及多级表头后,头部行号位置单元格错位的问题...
  8. The Model Driven Software Network
  9. Android开发学习之卡片式布局的简单实现
  10. axure如何页面滑动时广告位上移_Axure8.0教程:模拟滑动效果
  11. java的访问修饰符
  12. javascript客户端验证函数大全
  13. 《深入浅出通信原理》连载
  14. Resnet网络结构图和对应参数表的简单理解
  15. 个税计算公式excel_财务不会做工资表?全函数统计查询、自动个税计算模板送你,给力...
  16. 【机器学习】分类性能度量指标 : ROC曲线、AUC值、正确率、召回率、敏感度、特异度
  17. windows7系统安装,Ultimate(旗舰版)
  18. 排列和组合 Permutation and Combination
  19. javalang 生成抽象语法树AST ----python源码分析
  20. 终端模拟器 java_程序员必备之终端模拟器,让你的终端世界多一抹“颜色”

热门文章

  1. 不为情怀,忠于技术!
  2. 免邀请码的APP推广方式
  3. semaphore讲解
  4. mysql自然连接和等值连接_区分笛卡儿积,自然连接,等值连接,内连接,外连接...
  5. 某易云音乐JS逆向案例
  6. Chat Abbreviations
  7. Redis以及Redis的php扩展安装
  8. BVT (Build Verification Test)
  9. 『言善信』Fiddler工具 — 17、抓取移动端App请求
  10. 更改谷歌浏览器默认安装位置(实用!)