flink1.12.2+hudi0.9.0测试
1.环境准备
1.1、flink1.12.2
1.1.1 编译包下载:https://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.12.2/flink-1.12.2-bin-scala_2.11.tgz
1.1.2 flink的部署可参考上篇:https://blog.csdn.net/weixin_49218925/article/details/115511022
1.2、hudi0.9.0已发布,可直接下载hudi-flink-bundle_2.11-0.9.0.jar
2 flinksql写入hudi demo
Bach 模式的读写
插入数据
使用如下 DDL 语句创建 Hudi 表:
Flink SQL> create table t2(
> uuid varchar(20),
> name varchar(10),
> age int,
> ts timestamp(3),
> `partition` varchar(20)
> )
> PARTITIONED BY (`partition`)
> with (
> 'connector' = 'hudi',
> 'path' = 'oss://vvr-daily/hudi/t2'
> );
[INFO] Table has been created.
DDL 里申明了表的 path,record key 为默认值 uuid,pre-combine key 为默认值 ts 。
然后通过 VALUES 语句往表中插入数据:
Flink SQL> insert into t2 values
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 59f2e528d14061f23c552a7ebf9a76bd
这里看到 Flink 的作业已经成功提交到集群,可以本地打开 web UI 观察作业的执行情况: http://localhost:8081/
查询数据
作业执行完成后,通过 SELECT 语句查询表结果:
Flink SQL> set execution.result-mode=tableau;
[INFO] Session property has been set.
Flink SQL> select * from t2;
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id3 | Julian | 53 | 1970-01-01T00:00:03 | par2 |
| + | id4 | Fabian | 31 | 1970-01-01T00:00:04 | par2 |
| + | id7 | Bob | 44 | 1970-01-01T00:00:07 | par4 |
| + | id8 | Han | 56 | 1970-01-01T00:00:08 | par4 |
| + | id1 | Danny | 23 | 1970-01-01T00:00:01 | par1 |
| + | id2 | Stephen | 33 | 1970-01-01T00:00:02 | par1 |
| + | id5 | Sophia | 18 | 1970-01-01T00:00:05 | par3 |
| + | id6 | Emma | 20 | 1970-01-01T00:00:06 | par3 |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
Received a total of 8 rows
这里执行语句 set execution.result-mode=tableau;
可以让查询结果直接输出到终端。
通过在 WHERE 子句中添加 partition 路径来裁剪 partition:
Flink SQL> select * from t2 where `partition` = 'par1';
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id1 | Danny | 23 | 1970-01-01T00:00:01 | par1 |
| + | id2 | Stephen | 33 | 1970-01-01T00:00:02 | par1 |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
Received a total of 2 rows
更新数据
相同的 record key 的数据会自动覆盖,通过 INSERT 相同 key 的数据可以实现数据更新:
Flink SQL> insert into t2 values
> ('id1','Danny',24,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',34,TIMESTAMP '1970-01-01 00:00:02','par1');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 944de5a1ecbb7eeb4d1e9e748174fe4c
Flink SQL> select * from t2;
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id1 | Danny | 24 | 1970-01-01T00:00:01 | par1 |
| + | id2 | Stephen | 34 | 1970-01-01T00:00:02 | par1 |
| + | id3 | Julian | 53 | 1970-01-01T00:00:03 | par2 |
| + | id4 | Fabian | 31 | 1970-01-01T00:00:04 | par2 |
| + | id5 | Sophia | 18 | 1970-01-01T00:00:05 | par3 |
| + | id6 | Emma | 20 | 1970-01-01T00:00:06 | par3 |
| + | id7 | Bob | 44 | 1970-01-01T00:00:07 | par4 |
| + | id8 | Han | 56 | 1970-01-01T00:00:08 | par4 |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
Received a total of 8 rows
可以看到 uuid 为 id1 和 id2 的数据 age 字段值发生了更新。
再次 insert 新数据观察结果:
Flink SQL> insert into t2 values
> ('id4','Fabian',32,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',19,TIMESTAMP '1970-01-01 00:00:05','par3');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: fdeb7fd9f08808e66d77220f43075720
Flink SQL> select * from t2;
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id5 | Sophia | 19 | 1970-01-01T00:00:05 | par3 |
| + | id6 | Emma | 20 | 1970-01-01T00:00:06 | par3 |
| + | id3 | Julian | 53 | 1970-01-01T00:00:03 | par2 |
| + | id4 | Fabian | 32 | 1970-01-01T00:00:04 | par2 |
| + | id1 | Danny | 24 | 1970-01-01T00:00:01 | par1 |
| + | id2 | Stephen | 34 | 1970-01-01T00:00:02 | par1 |
| + | id7 | Bob | 44 | 1970-01-01T00:00:07 | par4 |
| + | id8 | Han | 56 | 1970-01-01T00:00:08 | par4 |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
Received a total of 8 rows
Streaming 读
通过如下语句创建一张新的表并注入数据:
Flink SQL> create table t1(
> uuid varchar(20),
> name varchar(10),
> age int,
> ts timestamp(3),
> `partition` varchar(20)
> )
> PARTITIONED BY (`partition`)
> with (
> 'connector' = 'hudi',
> 'path' = 'oss://vvr-daily/hudi/t1',
> 'table.type' = 'MERGE_ON_READ',
> 'read.streaming.enabled' = 'true',
> 'read.streaming.check-interval' = '4'
> );
[INFO] Table has been created.
Flink SQL> insert into t1 values
> ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 9e1dcd37fd0f8ca77534c30c7d87be2c
这里将 table option read.streaming.enabled
设置为 true,表明通过 streaming 的方式读取表数据;opiton read.streaming.check-interval
指定了 source 监控新的 commits 的间隔为 4s;option table.type
设置表类型为 MERGE_ON_READ,目前只有 MERGE_ON_READ
表支持 streaming 读。
以上操作发生在一个 terminal 中,我们称之为 terminal_1。
从新的 terminal(我们称之为 terminal_2)再次启动 Sql Client,重新创建 t1 表并查询:
Flink SQL> set execution.result-mode=tableau;
[INFO] Session property has been set.
Flink SQL> create table t1(
> uuid varchar(20),
> name varchar(10),
> age int,
> ts timestamp(3),
> `partition` varchar(20)
> )
> PARTITIONED BY (`partition`)
> with (
> 'connector' = 'hudi',
> 'path' = 'oss://vvr-daily/hudi/t1',
> 'table.type' = 'MERGE_ON_READ',
> 'read.streaming.enabled' = 'true',
> 'read.streaming.check-interval' = '4'
> );
[INFO] Table has been created.
Flink SQL> select * from t1;
2021-03-22 18:36:37,042 INFO org.apache.hadoop.conf.Configuration.deprecation [] - mapred.job.map.memory.mb is deprecated. Instead, use mapreduce.map.memory.mb
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id2 | Stephen | 33 | 1970-01-01T00:00:02 | par1 |
| + | id1 | Danny | 23 | 1970-01-01T00:00:01 | par1 |
| + | id6 | Emma | 20 | 1970-01-01T00:00:06 | par3 |
| + | id5 | Sophia | 18 | 1970-01-01T00:00:05 | par3 |
| + | id8 | Han | 56 | 1970-01-01T00:00:08 | par4 |
| + | id7 | Bob | 44 | 1970-01-01T00:00:07 | par4 |
| + | id4 | Fabian | 31 | 1970-01-01T00:00:04 | par2 |
| + | id3 | Julian | 53 | 1970-01-01T00:00:03 | par2 |
回到 terminal_1,继续执行 batch mode 的 INSERT 操作:
Flink SQL> insert into t1 values
> ('id1','Danny',27,TIMESTAMP '1970-01-01 00:00:01','par1');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 2dad24e067b38bc48c3a8f84e793e08b
几秒之后,观察 terminal_2 的输出多了一行:
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id2 | Stephen | 33 | 1970-01-01T00:00:02 | par1 |
| + | id1 | Danny | 23 | 1970-01-01T00:00:01 | par1 |
| + | id6 | Emma | 20 | 1970-01-01T00:00:06 | par3 |
| + | id5 | Sophia | 18 | 1970-01-01T00:00:05 | par3 |
| + | id8 | Han | 56 | 1970-01-01T00:00:08 | par4 |
| + | id7 | Bob | 44 | 1970-01-01T00:00:07 | par4 |
| + | id4 | Fabian | 31 | 1970-01-01T00:00:04 | par2 |
| + | id3 | Julian | 53 | 1970-01-01T00:00:03 | par2 |
| + | id1 | Danny | 27 | 1970-01-01T00:00:01 | par1 |
再次在 terminal_1 中执行 INSERT 操作:
Flink SQL> insert into t1 values
> ('id4','Fabian',32,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',19,TIMESTAMP '1970-01-01 00:00:05','par3');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: ecafffda3d294a13b0a945feb9acc8a5
观察 terminal_2 的输出变化:
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| +/- | uuid | name | age | ts | partition |
+-----+----------------------+----------------------+-------------+-------------------------+----------------------+
| + | id2 | Stephen | 33 | 1970-01-01T00:00:02 | par1 |
| + | id1 | Danny | 23 | 1970-01-01T00:00:01 | par1 |
| + | id6 | Emma | 20 | 1970-01-01T00:00:06 | par3 |
| + | id5 | Sophia | 18 | 1970-01-01T00:00:05 | par3 |
| + | id8 | Han | 56 | 1970-01-01T00:00:08 | par4 |
| + | id7 | Bob | 44 | 1970-01-01T00:00:07 | par4 |
| + | id4 | Fabian | 31 | 1970-01-01T00:00:04 | par2 |
| + | id3 | Julian | 53 | 1970-01-01T00:00:03 | par2 |
| + | id1 | Danny | 27 | 1970-01-01T00:00:01 | par1 |
| + | id5 | Sophia | 19 | 1970-01-01T00:00:05 | par3 |
| + | id4 | Fabian | 32 | 1970-01-01T00:00:04 | par2 |
flink1.12.2+hudi0.9.0测试相关推荐
- Flink-1.13集成hudi-0.10.0
把hudi的 hudi-flink-bundle_2.12-0.10.0.jar放到 flink的lib下即可从flink-sql客户端读写hudi表. 下面是完全参考hudi官网的示例 一.下载安装 ...
- flink1.12.7+hudi 问题总结
版本:CDH-6.3.2, flink-1.12.7 ,hudi -0.9.0/0.10.0 1.CDH安装flink,需要自己制作parcel,制作过程略; 2.hudi可以自己编译::https: ...
- flink1.12.0学习笔记第1篇-部署与入门
flink1.12.0学习笔记第 1 篇-部署与入门 flink1.12.0学习笔记第1篇-部署与入门 flink1.12.0学习笔记第2篇-流批一体API flink1.12.0学习笔记第3篇-高级 ...
- 测试hudi-0.7.0对接spark structure streaming
测试hudi-0.7.0对接spark structure streaming 测试环境 Hudi version :0.7.0 Spark version :2.4.0 Hive version : ...
- flink1.12.0学习笔记第2篇-流批一体API
flink1.12.0学习笔记第 2 篇-流批一体API flink1.12.0学习笔记第1篇-部署与入门 flink1.12.0学习笔记第2篇-流批一体API flink1.12.0学习笔记第3篇- ...
- Flink1.12.0简单实现wordcount
文章目录 前言 一.Flink1.12.0简单实现wordcount 二.使用步骤 1.引入pom.xml 2.主类 3.运行结果 总结 前言 Flink1.12.0简单实现wordcount 一.F ...
- 1.安装flink-1.12.2
FLINK on YARN模式 解压安装包: tar -zvxf flink-1.12.2-bin-scala_2.11.tgz /opt/ 修改yarn配置,设置application master ...
- 实践数据湖iceberg 第二十一课 flink1.13.5 + iceberg0.131 CDC(测试成功INSERT,变更操作失败)
系列文章目录 实践数据湖iceberg 第一课 入门 实践数据湖iceberg 第二课 iceberg基于hadoop的底层数据格式 实践数据湖iceberg 第三课 在sqlclient中,以sql ...
- Flink1.12 - 概述、安装部署及快速入门
1. Flink概述 1.1 Flink官方介绍 flink官网地址 1.2 Flink组件栈 一个计算框架要有长远的发展,必须打造一个完整的 Stack.只有上层有了具体的应用,并能很好的发挥计算 ...
最新文章
- 最快让你上手ReactiveCocoa之进阶篇
- 关于TableLayoutPanel里放入控件无法将Dock设为Fill的解决办法
- 二线城市IT人员如何发展
- 【web安全】第三弹:web攻防平台pentester安装及XSS部分答案解析
- 苹果手机如何减少后台流量
- 伪数据科学家 VS 真数据科学家
- google datastudio 使用教程
- Android APP程序更新报解析软件包时出现错误问题解决方法
- 北京的电竞学校的要求有哪些?
- 三级网络技术备考重点之中小型网络系统总体规划与设计
- CAD进阶练习(二)
- Java——万字总结网络编程
- 海思3559A的一些工具探索尝试
- 基于Qt5.14.2和mingw的Qt源码学习(三) — 元对象系统简介及moc工具是如何保存类属性和方法的
- FT、DTFT和DFT之间的关系
- 在C语言中使用else if判断数字是正数还是负数或是零。
- 布尔类型(boolean)常量与变量
- Spring为什么需要使用三级缓存?
- OpenGL学习笔记(二)
- Spring的那些事情(二)
热门文章
- Postman操作使用
- oauth2.0 学习案例demo_Vue3教程:用 Vue3 开发小程序,这里有一份实际的代码案例!...
- ansible最大并发_通过这7种方法来最大程度地提高Ansible技能
- 2017的中国开放_2017年开放科学如何发展
- php框架和不用框架_如何选择一个PHP框架
- android 汽车 源码_汽车级Linux,无需Google即可运行Android等
- Bootstrap插件通过noConfllict 避免冲突
- 十五.激光和惯导LIO-SLAM框架学习之惯导与雷达外参标定(1)
- ROS笔记(28) Setup Assistant
- ROS笔记(11) Qt工具箱