参考地址:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0

https://github.com/apache/kafka/tree/2.7/connect/mirror

MirrorMaker 2.0
MM2 leverages the Connect framework to replicate topics between Kafka clusters. MM2 includes several new features, including:

both topics and consumer groups are replicated
topic configuration and ACLs are replicated
cross-cluster offsets are synchronized
partitioning is preserved
Replication flows
MM2 replicates topics and consumer groups from upstream source clusters to downstream target clusters. These directional flows are notated A->B.

It’s possible to create complex replication topologies based on these source->target flows, including:

fan-out, e.g. K->A, K->B, K->C
aggregation, e.g. A->K, B->K, C->K
active/active, e.g. A->B, B->A

Each replication flow can be configured independently, e.g. to replicate specific topics or groups:

A->B.topics = topic-1, topic-2
A->B.groups = group-1, group-2

By default, all topics and consumer groups are replicated (except excluded ones), across all enabled replication flows. Each replication flow must be explicitly enabled to begin replication:

A->B.enabled = true
B->A.enabled = true

Starting an MM2 process
You can run any number of MM2 processes as needed. Any MM2 processes which are configured to replicate the same Kafka clusters will find each other, share configuration, load balance, etc.

To start an MM2 process, first specify Kafka cluster information in a configuration file as follows:

mm2.properties

clusters = us-west, us-east
us-west.bootstrap.servers = host1:9092
us-east.bootstrap.servers = host2:9092

You can list any number of clusters this way.

Optionally, you can override default MirrorMaker properties:

topics = .*
groups = group1, group2
emit.checkpoints.interval.seconds = 10

These will apply to all replication flows. You can also override default properties for specific clusters or replication flows:

configure a specific cluster

us-west.offset.storage.topic = mm2-offsets

configure a specific source->target replication flow

us-west->us-east.emit.heartbeats = false
Next, enable individual replication flows as follows:

us-west->us-east.enabled = true # disabled by default
Finally, launch one or more MirrorMaker processes with the connect-mirror-maker.sh script:

$ ./bin/connect-mirror-maker.sh mm2.properties
Multicluster environments
MM2 supports replication between multiple Kafka clusters, whether in the same data center or across multiple data centers. A single MM2 cluster can span multiple data centers, but it is recommended to keep MM2’s producers as close as possible to their target clusters. To do so, specify a subset of clusters for each MM2 node as follows:

in west DC:

$ ./bin/connect-mirror-maker.sh mm2.properties --clusters west-1 west-2
This signals to the node that the given clusters are nearby, and prevents the node from sending records or configuration to clusters in other data centers.

Example
Say there are three data centers (west, east, north) with two Kafka clusters in each data center (west-1, west-2 etc). We can configure MM2 for active/active replication within each data center, as well as cross data center replication (XDCR) as follows:

mm2.properties

clusters: west-1, west-2, east-1, east-2, north-1, north-2

west-1.bootstrap.servers = …
—%<—

active/active in west

west-1->west-2.enabled = true
west-2->west-1.enabled = true

active/active in east

east-1->east-2.enabled = true
east-2->east-1.enabled = true

active/active in north

north-1->north-2.enabled = true
north-2->north-1.enabled = true

XDCR via west-1, east-1, north-1

west-1->east-1.enabled = true
west-1->north-1.enabled = true
east-1->west-1.enabled = true
east-1->north-1.enabled = true
north-1->west-1.enabled = true
north-1->east-1.enabled = true

Then, launch MM2 in each data center as follows:

in west:

$ ./bin/connect-mirror-maker.sh mm2.properties --clusters west-1 west-2

in east:

$ ./bin/connect-mirror-maker.sh mm2.properties --clusters east-1 east-2

in north:

$ ./bin/connect-mirror-maker.sh mm2.properties --clusters north-1 north-2
With this configuration, records produced to any cluster will be replicated within the data center, as well as across to other data centers. By providing the --clusters parameter, we ensure that each node only produces records to nearby clusters.

N.B. that the --clusters parameter is not technically required here. MM2 will work fine without it; however, throughput may suffer from “producer lag” between data centers, and you may incur unnecessary data transfer costs.

Configuration
The following sections target for dedicated MM2 cluster. If running MM2 in a Connect cluster, please refer to KIP-382: MirrorMaker 2.0 for guidance.

General Kafka Connect Config
All Kafka Connect, Source Connector, Sink Connector configs, as defined in Kafka official doc, can be directly used in MM2 configuration without prefix in the configuration name. As the starting point, most of these default configs may work well with the exception of tasks.max.

In order to evenly distribute the workload across more than one MM2 instance, it is advised to set tasks.max at least to 2 or even larger depending on the hardware resources and the total number partitions to be replicated.

Kafka Connect Config for a Specific Connector
If needed, Kafka Connect worker-level configs could be even specified “per connector”, which needs to follow the format of cluster_alias.config_name in MM2 configuration. For example,

backup.ssl.truststore.location = /usr/lib/jvm/zulu-8-amd64/jre/lib/security/cacerts // SSL cert location
backup.security.protocol = SSL // if target cluster needs SSL to send message
MM2 Config for a Specific Connector
MM2 itself has many configs to control how it behaves. To override those default values, add the config name by the format of source_cluster_alias->target_cluster_alias.config_name in MM2 configuration. For example,

backup->primary.enabled = false // set to false if one-way replication is desired
primary->backup.topics.blacklist = topics_to_blacklist
primary->backup.emit.heartbeats.enabled = false
primary->backup.sync.group.offsets = true
Producer / Consumer / Admin Config used by MM2
In many cases, customized values for producer or consumer configurations are needed. In order to override the default values of producer or consumer used by MM2, target_cluster_alias.producer.producer_config_name, source_cluster_alias.consumer.consumer_config_name or cluster_alias.admin.admin_config_name are the formats to use in MM2 configuration. For example,

backup.producer.compression.type = gzip
backup.producer.buffer.memory = 32768
primary.consumer.isolation.level = read_committed
primary.admin.bootstrap.servers = localhost:9092
Shared configuration
MM2 processes share configuration via their target Kafka clusters. For example, the following two processes would be racy:

process1:

A->B.enabled = true
A->B.topics = foo

process2:

A->B.enabled = true
A->B.topics = bar
In this case, the two processes will share configuration via cluster B. Depending on which processes is elected “leader”, the result will be that either foo or bar is replicated – but not both. For this reason, it is important to keep configuration consistent across flows to the same target cluster. In most cases, your entire organization should use a single MM2 configuration file.

Remote topics
MM2 employs a naming convention to ensure that records from different clusters are not written to the same partition. By default, replicated topics are renamed based on “source cluster aliases”:

topic-1 --> source.topic-1
This can be customized by overriding the replication.policy.separator property (default is a period). If you need more control over how remote topics are defined, you can implement a custom ReplicationPolicy and override replication.policy.class (default is DefaultReplicationPolicy).

Monitoring an MM2 process
MM2 is built on the Connect framework and inherits all of Connect’s metrics, e.g. source-record-poll-rate. In addition, MM2 produces its own metrics under the kafka.connect.mirror metric group. Metrics are tagged with the following properties:

  • target: alias of target cluster

  • source: alias of source cluster

  • topic: remote topic on target cluster

  • partition: partition being replicated
    Metrics are tracked for each remote topic. The source cluster can be inferred from the topic name. For example, replicating topic1 from A->B will yield metrics like:

  • target=B

  • topic=A.topic1

  • partition=1
    The following metrics are emitted:

MBean: kafka.connect.mirror:type=MirrorSourceConnector,target=([-.w]+),topic=([-.w]+),partition=([0-9]+)

record-count # number of records replicated source -> target
record-age-ms # age of records when they are replicated
record-age-ms-min
record-age-ms-max
record-age-ms-avg
replication-latecny-ms # time it takes records to propagate source->target
replication-latency-ms-min
replication-latency-ms-max
replication-latency-ms-avg
byte-rate # average number of bytes/sec in replicated records

MBean: kafka.connect.mirror:type=MirrorCheckpointConnector,source=([-.w]+),target=([-.w]+)

checkpoint-latency-ms   # time it takes to replicate consumer offsets
checkpoint-latency-ms-min
checkpoint-latency-ms-max
checkpoint-latency-ms-avg

These metrics do not discern between created-at and log-append timestamps.

Kafka MirrorMaker2.0 (异地双活/跨数据中心容灾/跨集群容灾)相关推荐

  1. mysql集群跨地域同步部署_跨地域冗余 - 跨数据中心部署方案 - 《TiDB v2.1 用户文档》 - 书栈网 · BookStack...

    跨数据中心部署方案 作为 NewSQL 数据库,TiDB 兼顾了传统关系型数据库的优秀特性以及 NoSQL 数据库可扩展性,以及跨数据中心(下文简称"中心")场景下的高可用.本文档 ...

  2. mysql异地双活架构,银行跨数据中心数据库双活架构设计:五大难点攻克

    银行跨数据中心数据库双活架构设计:五大难点攻克 发布时间:2018-12-09 10:21, 浏览次数:327 数据库双活技术已成为企业重点关注的对象,社区最近组织了交流活动,以帮助大家更好的明确理解 ...

  3. 银行跨数据中心数据库双活架构设计:五大难点攻克

    数据库双活技术已成为企业重点关注的对象,社区最近组织了交流活动,以帮助大家更好的明确理解数据中心建设.我们将活动内容总结为设计原则.技术选型和五大难点攻克. 前篇见:银行跨数据中心数据库双活架构设计: ...

  4. 跨数据中心场景下,kafka集群部署模式

    kafka在多数据中心场景下和单数据中心的场景部署是一样的吗?kafka的性能对分布式系统而言,非常重要.一旦延迟较大的情况下,应该如何部署. 一.为什么要跨数据中心部署? 大型的分布式软件,发展到一 ...

  5. 从Oracle RAC角度看跨数据中心的存储双活配置注意事项

    从Oracle RAC角度看跨数据中心的存储双活配置注意事项 Oracle RAC在设计的时候是没有考虑跨数据中心双活的,它的设计目的是为一个数据中心内有着共享存储的多个主机实现负载均衡和高可用性.但 ...

  6. 专访阿里巴巴毕玄:异地多活数据中心项目的来龙去脉

    大数据时代,数据中心的异地容灾变得非常重要.在去年双十一之前,阿里巴巴上线了数据中心异地双活项目.InfoQ就该项目采访了阿里巴巴的林昊(花名毕玄). 毕玄是阿里巴巴技术保障部的研究员,负责性能容量架 ...

  7. Windows Azure Virtual Network (13) 跨数据中心之间的虚拟网络点对点连接VNet Peering

    <Windows Azure Platform 系列文章目录> 今天是大年初二,首先祝大家新年快乐,万事如意. 在笔者之前的文章中:Windows Azure Virtual Networ ...

  8. linux怎么跨节点访问数据,Apache Cassandra多节点跨数据中心集群配置以及日常操作...

    Cassandra是去中心化的集群架构,没有传统集群的中心节点,各个节点地位都是平等的,通过Gossip协议维持集群中的节点信息.为了使集群中的各节点在启动时能发现其他节点,需要指定种子节点(seed ...

  9. 绿色创意2.0 探访阿里千岛湖数据中心

    本文讲的是绿色创意2.0 探访阿里千岛湖数据中心[IT168 云计算]一直以来,数据中心的选址一般都要综合考虑周边的自然环境,从而因地制宜的采用外界自然资源来帮助数据中心节电.节能,有效缓解数据中心的 ...

  10. Harbor: 跨数据中心复制Docker镜像的开源实现

    2019独角兽企业重金招聘Python工程师标准>>> Harbor: 跨数据中心复制Docker镜像的开源实现 博客分类: docker VMware公司3月份开源了企业级Regi ...

最新文章

  1. Effective C# Item45 : 优先选择强异常安全保证
  2. 数据可视化|实验五 分析1996-2015年人口数据各个特征的分布与分散状况
  3. Python爬虫连载16-OCR工具Tesseract、Scrapt初步
  4. LVM源码分析2-libdaemon
  5. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
  6. 【Sniffer】如何查看Sniffer截获的数据内容
  7. Java千百问_05面向对象(008)_java中覆盖是什么
  8. eclipse哪个版本最稳定_新版鹿鼎记开播啦,你最喜欢哪个版本?最还原原著的原来是他...
  9. python利用reportlab打印图文并茂内容
  10. 8.cisco思科模拟器无线路由器设备实训练习
  11. 【接口测试】接口测试用例设计
  12. GMT和UTC时区概念
  13. 无人驾驶动态避障策略调研 | 机器人动态避障策略 | 行人轨迹预测 | 机器人导航
  14. laravel 将汉字转化成拼音的库
  15. 微软Hyper-V虚拟机复制实现双机备份过程
  16. mysql导入指定数据库_mysql命令行导入sql文件到指定数据库的方法
  17. python 申请内存_python内存分配
  18. 互联网快讯:阿里云发布第四代神龙架构;微信支付正式推出品牌视频号;猿辅导加速布局素质教育
  19. ElementUI 树型组件 el-tree 后台数据结构构建
  20. 【历史上的今天】3 月 23 日:网景创始人出生;FORMAC 语言的开发者诞生;PRMan 非商业版发布

热门文章

  1. 动态游标(例如表名作为参数)以及动态SQL分析
  2. 当博客系统遇上live2d后
  3. [jQuery原理] jQuery入口函数
  4. 关于A+B+C问题4种语言的解决办法,Java、C语言、C++、Python
  5. 关于Intel IPP的基本使用方法——参照可设置ipl库
  6. php伪静态不支持中文,Discuz开启伪静态导致中文会员使用手机无法访问的解决方法...
  7. 双向循环链表中结点的交换(C++)
  8. activiti 7中文文档_如何阅读文档-以Pandas库为例
  9. echarts标记线的样式_ECharts提示框组件指示器的线条样式
  10. java 数字运算异常_Java基础之:异常及异常处理