Kafka实现MySQL增量同步

目标

本文是对[1]的复现和整理

环境

组件	版本
Zookeeper	3.6.0
Kafka	2.5.0
Mysql	8.0.21-0ubuntu0.20.04.4

准备工作

分别新建两个数据库A和B，然后各自新建一个表格

mysql> create database A;
Query OK, 1 row affected (0.12 sec)

mysql> create database B;
Query OK, 1 row affected (0.08 sec)

mysql> use A;
Database changed
mysql> CREATE TABLE `person` (
-> `pid` int(11) NOT NULL AUTO_INCREMENT,
-> `firstname` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
-> `age` int(11) DEFAULT NULL,
-> PRIMARY KEY (`pid`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected, 3 warnings (0.82 sec)

mysql> use B;
Database changed
mysql> CREATE TABLE `kafkaperson` (
-> `pid` int(11) NOT NULL AUTO_INCREMENT,
-> `firstname` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
-> `age` int(11) DEFAULT NULL,
-> PRIMARY KEY (`pid`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Query OK, 0 rows affected, 5 warnings (0.49 sec)

集群启动

启动Hadoop,Zookeeper与Kafka

测试

①生产者:

$KAFKA/bin/kafka-topics.sh --create --zookeeper Desktop:2181 --replication-factor 1 --partitions 1 --topic mysql-kafka-person

②消费者

$KAFKA/bin/connect-standalone.sh $KAFKA/config/connect-standalone.properties $KAFKA/config/quickstart-mysql.properties $KAFKA/config/quickstart-mysql-sink.properties

③往A表插入条数据

mysql> INSERT INTO person (pid,firstname,age) VALUES ( 1, 'zs',66);
Query OK, 1 row affected (0.07 sec)

mysql> select * from person;
+-----+-----------+------+
| pid | firstname | age |
+-----+-----------+------+
| 1 | zs | 66 |
+-----+-----------+------+
1 row in set (0.00 sec)

④mysql> use B;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-------------+
| Tables_in_B |
+-------------+
| kafkaperson |
+-------------+
1 row in set (0.00 sec)

mysql> select * from kafkaperson
-> ;
+-----+-----------+------+
| pid | firstname | age |
+-----+-----------+------+
| 1 | zs | 66 |
+-----+-----------+------+
1 row in set (0.00 sec)

可以看到mysql 表Ａ的数据通过kafka顺利传达到了表B，而在我们的kafka终端也会看到相关信息:

附录

quickstart-mysql.properties

name=mysql-a-source-person
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://Desktop:3306/A?user=appleyuchi&password=appleyuchi
# incrementing  自增
mode=incrementing
# 自增字段  pid
incrementing.column.name=pid
# 白名单表  person
table.whitelist=person
# topic前缀   mysql-kafka-
topic.prefix=mysql-kafka-

quickstart-mysql-sink.properties

name=mysql-a-sink-person
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
#kafka的topic名称
topics=mysql-kafka-person
# 配置JDBC链接
connection.url=jdbc:mysql://Desktop:3306/B?user=appleyuchi&password=appleyuchi
# 不自动创建表，如果为true，会自动创建表，表名为topic名称
auto.create=false
# upsert model更新和插入
insert.mode=upsert
# 下面两个参数配置了以pid为主键更新
pk.mode = record_value
pk.fields = pid
#表名为kafkatable
table.name.format=kafkaperson

Reference:

[1]Kafka Connect 实现MySQL增量同步

[2]Kafka connect快速构建数据ETL通道

[3]Kafka Connect 日志配置