iceberg分区演化
可以通过添加、删除、重命名或重新排序分区规范字段来改进表分区。
更改分区规格会生成一个由唯一规格 ID 标识的新规格,该 ID 将添加到表的分区规格列表中,并且可以设置为表的默认分区规格。
在变更分区规范时,更改不应导致分区字段 ID (field id)更改,因为分区字段 ID 用作清单文件(manifest)中的分区元组字段 ID(partition tuple field ID)。
在 v2 中,必须为每个分区字段显式跟踪分区字段 ID。新 ID 是根据表元数据中最后分配的分区 ID 分配的。
在v1中,分区字段id不被跟踪,而是从1000开始顺序分配。当从多个规格中读取基于清单文件(manifest)的元数据表时,这种分配机制会导致问题,因为具有相同ID的分区字段可能包含不同的数据类型。为了与旧版本兼容,对于v1表中的分区演化,建议遵循以下规则:

  • 不要重新排序分区字段
  • 不要删除分区字段;而是用void变换替换字段
  • 仅在前一个分区规范的末尾添加分区字段

下面来看几个实例:

CREATE TABLE local.db.sample ( id bigint, data string, category string)
USING iceberg
PARTITIONED BY (category)insert into local.db.sample values(1,'a','1')

查看metainfo文件:

{"format-version" : 1,"table-uuid" : "94ad30ed-4a31-438d-b81b-36d791471d2c","location" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample","last-updated-ms" : 1642174094175,"last-column-id" : 3,"schema" : {"type" : "struct","schema-id" : 0,"fields" : [ {"id" : 1,"name" : "id","required" : false,"type" : "long"}, {"id" : 2,"name" : "data","required" : false,"type" : "string"}, {"id" : 3,"name" : "category","required" : false,"type" : "string"} ]},"current-schema-id" : 0,"schemas" : [ {"type" : "struct","schema-id" : 0,"fields" : [ {"id" : 1,"name" : "id","required" : false,"type" : "long"}, {"id" : 2,"name" : "data","required" : false,"type" : "string"}, {"id" : 3,"name" : "category","required" : false,"type" : "string"} ]} ],"partition-spec" : [ {"name" : "category","transform" : "identity","source-id" : 3,"field-id" : 1000} ],"default-spec-id" : 0,"partition-specs" : [ {"spec-id" : 0,"fields" : [ {"name" : "category","transform" : "identity","source-id" : 3,"field-id" : 1000} ]} ],"last-partition-id" : 1000,"default-sort-order-id" : 0,"sort-orders" : [ {"order-id" : 0,"fields" : [ ]} ],"properties" : {"owner" : "liliwei"},"current-snapshot-id" : 3476183237498309505,"snapshots" : [ {"snapshot-id" : 3476183237498309505,"timestamp-ms" : 1642174094175,"summary" : {"operation" : "append","spark.app.id" : "local-1642173017469","added-data-files" : "1","added-records" : "1","added-files-size" : "874","changed-partition-count" : "1","total-records" : "1","total-files-size" : "874","total-data-files" : "1","total-delete-files" : "0","total-position-deletes" : "0","total-equality-deletes" : "0"},"manifest-list" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/snap-3476183237498309505-1-002e475b-e5b9-485e-a59d-35730a6c9f4e.avro","schema-id" : 0} ],"snapshot-log" : [ {"timestamp-ms" : 1642174094175,"snapshot-id" : 3476183237498309505} ],"metadata-log" : [ {"timestamp-ms" : 1642173226793,"metadata-file" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/v1.metadata.json"} ]
}%

查看snap文件:
java -jar ~/plat/tools/avro-tools-1.10.2.jar tojson snap-3476183237498309505-1-002e475b-e5b9-485e-a59d-35730a6c9f4e.avro

{"manifest_path": "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/002e475b-e5b9-485e-a59d-35730a6c9f4e-m0.avro","manifest_length": 6095,"partition_spec_id": 0,"added_snapshot_id": {"long": 3476183237498309505},"added_data_files_count": {"int": 1},"existing_data_files_count": {"int": 0},"deleted_data_files_count": {"int": 0},"partitions": {"array": [{"contains_null": false,"contains_nan": {"boolean": false},"lower_bound": {"bytes": "1"},"upper_bound": {"bytes": "1"}}]},"added_rows_count": {"long": 1},"existing_rows_count": {"long": 0},"deleted_rows_count": {"long": 0}
}

ALTER TABLE local.db.sample ADD PARTITION FIELD data

查看目录结构:

(base) ➜ metadata tree -l
.
├── 002e475b-e5b9-485e-a59d-35730a6c9f4e-m0.avro
├── snap-3476183237498309505-1-002e475b-e5b9-485e-a59d-35730a6c9f4e.avro
├── v1.metadata.json
├── v2.metadata.json
├── v3.metadata.json
└── version-hint.text0 directories, 6 files

查看v3.metadata.json文件:

{"format-version" : 1,"table-uuid" : "94ad30ed-4a31-438d-b81b-36d791471d2c","location" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample","last-updated-ms" : 1642175874398,"last-column-id" : 3,"schema" : {"type" : "struct","schema-id" : 0,"fields" : [ {"id" : 1,"name" : "id","required" : false,"type" : "long"}, {"id" : 2,"name" : "data","required" : false,"type" : "string"}, {"id" : 3,"name" : "category","required" : false,"type" : "string"} ]},"current-schema-id" : 0,"schemas" : [ {"type" : "struct","schema-id" : 0,"fields" : [ {"id" : 1,"name" : "id","required" : false,"type" : "long"}, {"id" : 2,"name" : "data","required" : false,"type" : "string"}, {"id" : 3,"name" : "category","required" : false,"type" : "string"} ]} ],"partition-spec" : [ {"name" : "category","transform" : "identity","source-id" : 3,"field-id" : 1000}, {"name" : "data","transform" : "identity","source-id" : 2,"field-id" : 1001} ],"default-spec-id" : 1,"partition-specs" : [ {"spec-id" : 0,"fields" : [ {"name" : "category","transform" : "identity","source-id" : 3,"field-id" : 1000} ]}, {"spec-id" : 1,"fields" : [ {"name" : "category","transform" : "identity","source-id" : 3,"field-id" : 1000}, {"name" : "data","transform" : "identity","source-id" : 2,"field-id" : 1001} ]} ],"last-partition-id" : 1001,"default-sort-order-id" : 0,"sort-orders" : [ {"order-id" : 0,"fields" : [ ]} ],"properties" : {"owner" : "liliwei"},"current-snapshot-id" : 3476183237498309505,"snapshots" : [ {"snapshot-id" : 3476183237498309505,"timestamp-ms" : 1642174094175,"summary" : {"operation" : "append","spark.app.id" : "local-1642173017469","added-data-files" : "1","added-records" : "1","added-files-size" : "874","changed-partition-count" : "1","total-records" : "1","total-files-size" : "874","total-data-files" : "1","total-delete-files" : "0","total-position-deletes" : "0","total-equality-deletes" : "0"},"manifest-list" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/snap-3476183237498309505-1-002e475b-e5b9-485e-a59d-35730a6c9f4e.avro","schema-id" : 0} ],"snapshot-log" : [ {"timestamp-ms" : 1642174094175,"snapshot-id" : 3476183237498309505} ],"metadata-log" : [ {"timestamp-ms" : 1642173226793,"metadata-file" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/v1.metadata.json"}, {"timestamp-ms" : 1642174094175,"metadata-file" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/v2.metadata.json"} ]
}%

插入数据

insert into local.db.sample values(2,'b','2');

查看目录结构:

(base) ➜ metadata tree -l
.
├── 002e475b-e5b9-485e-a59d-35730a6c9f4e-m0.avro
├── ed1a1f56-56fc-4313-bf60-10df0c4e88ca-m0.avro
├── snap-2641901311316255446-1-ed1a1f56-56fc-4313-bf60-10df0c4e88ca.avro
├── snap-3476183237498309505-1-002e475b-e5b9-485e-a59d-35730a6c9f4e.avro
├── v1.metadata.json
├── v2.metadata.json
├── v3.metadata.json
├── v4.metadata.json
└── version-hint.text0 directories, 9 files
java -jar ~/plat/tools/avro-tools-1.10.2.jar tojson snap-2641901311316255446-1-ed1a1f56-56fc-4313-bf60-10df0c4e88ca.avro
{"manifest_path": "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/ed1a1f56-56fc-4313-bf60-10df0c4e88ca-m0.avro","manifest_length": 6301,"partition_spec_id": 1,"added_snapshot_id": {"long": 2641901311316255446},"added_data_files_count": {"int": 1},"existing_data_files_count": {"int": 0},"deleted_data_files_count": {"int": 0},"partitions": {"array": [{"contains_null": false,"contains_nan": {"boolean": false},"lower_bound": {"bytes": "2"},"upper_bound": {"bytes": "2"}}, {"contains_null": false,"contains_nan": {"boolean": false},"lower_bound": {"bytes": "b"},"upper_bound": {"bytes": "b"}}]},"added_rows_count": {"long": 1},"existing_rows_count": {"long": 0},"deleted_rows_count": {"long": 0}
} {"manifest_path": "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/sample/metadata/002e475b-e5b9-485e-a59d-35730a6c9f4e-m0.avro","manifest_length": 6095,"partition_spec_id": 0,"added_snapshot_id": {"long": 3476183237498309505},"added_data_files_count": {"int": 1},"existing_data_files_count": {"int": 0},"deleted_data_files_count": {"int": 0},"partitions": {"array": [{"contains_null": false,"contains_nan": {"boolean": false},"lower_bound": {"bytes": "1"},"upper_bound": {"bytes": "1"}}]},"added_rows_count": {"long": 1},"existing_rows_count": {"long": 0},"deleted_rows_count": {"long": 0}
}

iceberg系列(2):存储详解-partition-1相关推荐

  1. Docker系列07—Dockerfile 详解

    Docker系列07-Dockerfile 详解 1.认识Dockerfile 1.1 镜像的生成途径 基于容器制作  dockerfile,docker build 基于容器制作镜像,已经在上篇Do ...

  2. 云原生存储详解:容器存储与 K8s 存储卷

    作者 | 阚俊宝 阿里云技术专家 导读:云原生存储详解系列文章将从云原生存储服务的概念.特点.需求.原理.使用及案例等方面,和大家一起探讨云原生存储技术新的机遇与挑战.本文为该系列文章的第二篇,会对容 ...

  3. mongo 3.4分片集群系列之六:详解配置数据库

    这个系列大致想跟大家分享以下篇章: 1.mongo 3.4分片集群系列之一:浅谈分片集群 2.mongo 3.4分片集群系列之二:搭建分片集群--哈希分片 3.mongo 3.4分片集群系列之三:搭建 ...

  4. k8s挂载目录_云原生存储详解:容器存储与 K8s 存储卷

    作者 | 阚俊宝 阿里云技术专家 导读:云原生存储详解系列文章将从云原生存储服务的概念.特点.需求.原理.使用及案例等方面,和大家一起探讨云原生存储技术新的机遇与挑战.本文为该系列文章的第二篇,会对容 ...

  5. docker修改镜像的存储位置_云原生存储详解:容器存储与 K8s 存储卷(内含赠书福利)...

    作者 | 阚俊宝  阿里巴巴技术专家 参与文末留言互动,即有机会获得赠书福利! 导读:云原生存储详解系列文章将从云原生存储服务的概念.特点.需求.原理.使用及案例等方面,和大家一起探讨云原生存储技术新 ...

  6. 云原生存储详解:容器存储与K8s存储卷

    作者 | 阚俊宝 阿里云技术专家 导读:云原生存储详解系列文章将从云原生存储服务的概念.特点.需求.原理.使用及案例等方面,和大家一起探讨云原生存储技术新的机遇与挑战.本文为该系列文章的第二篇,会对容 ...

  7. 小猫爪:i.MX RT1050学习笔记26-RT1xxx系列的FlexCAN详解

    i.MX RT1050学习笔记26-RT1xxx系列的FlexCAN详解 1 前言 2 FlexCAN简介 2.1 MB(邮箱)系统 2.1.1 正常模式下 2.1.2 激活了CAN FD情况下 2. ...

  8. PointNet系列代码复现详解(1)—PointNet分类部分

    想尽快入门点云,因此就从这个经典的点云处理神经网络开始.源码已经有了中文注释,但在一些对于自己不理解的地方添加了一些注释.欢迎大家一起讨论. 代码是来自github:GitHub - yanx27/P ...

  9. C语言中itoa系列函数及sprintf系列函数使用详解

    C语言中itoa系列函数及sprintf系列函数使用详解 itoa函数系列 该系列函数是广泛使用的非标准C语言和C++语言扩展功能,只能在windows编译器下使用,如果涉及到跨平台是不允许使用的,这 ...

  10. ftm模块linux驱动,飞思卡尔k系列_ftm模块详解.doc

    飞思卡尔k系列_ftm模块详解 1.5FTM模块1.5.1 FTM模块简介FTM模块是一个多功能定时器模块,主要功能有,PWM输出.输入捕捉.输出比较.定时中断.脉冲加减计数.脉冲周期脉宽测量.在K1 ...

最新文章

  1. SQL server 专业词汇
  2. 拿到input输入的时间_【Keras 笔记】Input/Dense层的数学本质
  3. 推荐!神经进化才是深度学习未来的发展之路!
  4. 微博客户端播放器的演进之路
  5. python中单行注释采用的符号是什么_Python注释符号使用说明(多行注释和单行注释),用法,详解,攻略...
  6. matlab 查看dll的函数参数类型,MATLAB调用dll文件中的库函数时的变量类型匹配问题?...
  7. 安装activemq
  8. android多环境,Android多环境配置打包
  9. mysql创建jdbc数据库_创建本地数据库mySQL并连接JDBC
  10. linux连接交换机软件,如何用超级终端连接交换机 - 全文
  11. 学会这10种定时任务,我有点飘了
  12. 利用百度云存储制作外链mp3音乐地址
  13. JavaScript高级04 正则表达式
  14. HBBuilderProjest逆向分析与安全性扯淡
  15. 超级玛丽——(陷阱问题) 蓝桥杯
  16. 3000字梳理大数据开发流程及规范(建议收藏)
  17. Axios 简单使用指南
  18. c语言和Java你好世界,C编程语言之“你好世界”的例子
  19. 山东大学软件工程应用与实践——GMSSL开源库(一) ——WINDOWS下GMSSL的安装与编译的超详细保姆级攻略
  20. ios应用中添加广告

热门文章

  1. 【论文复现3】算法2——Clustered sampling based on model similarity
  2. 《计算机视觉技术与应用》-----第二章 图像处理基础
  3. 回首系列01: 假如我的人生就像是在炒股
  4. 1.1.1 原子物理学——氢原子的电子轨道半径、能量、速度
  5. 你眼中的未来,是我们回不去的曾经
  6. 二,sdio总线简介之Commond
  7. 【文学文娱】《河北省》的荣光
  8. 自主专利 养护式洗车 净车侠以创新开启财富盛宴
  9. 审查意见通知书如何回复?意见陈述书模板
  10. 【未完】学习node.js过程中遇到的大坑