普通表的加载

1.load方式

load data [local] inpath [源文件路径] into table 目标表名;

从HDFS上加载数据，本质上是移动文件所在的路径

load data inpath '/user/student.txt' into table student;

从本地加载数据，本质上是复制本地的文件到HDFS上

load data local inpath '/user/student.txt' into table student;

2.insert方式

插入一条数据(单重，先生成临时表再拷贝到student中，效率低)

insert into table student values(00011,黄海霞,18);

插入多条数据(单重，查询结果导入student中，效率低)

insert into table student select * from stu where age >=18;

多重插入(只扫描一次源表，将结果插入到多个新表中，效率高，常用)

from stu insert into table student01 select * where age >=18 insert into table student02 select * where age <18;

分区表的加载（常用）

一般不使用load方式，因为这种方式不会自动检验原表与目标表的列是否对应，数据易出错；
一般也不建议使用insert方式单条插入；

静态分区，即分区个数较少，可列举：

1)手动添加分区

ALTER TABLE student ADD if not exists PARTITION(city='beijing');

ALTER TABLE student ADD if not exists PARTITION(city='shanghai');

2)添加数据

from stu insert into table student01 partition(city='beijing') select id, sname, age where city='beijing' insert into table student02 partition(city='shanghai') select id, sname, age where city='shanghai';

动态分区，即分区个数较多，比如日期、年龄：

1)修改分区模式为非严格模式

set hive.exec.dynamic.partition.mode = nonstrict;#hive2版本，默认是strict

set hive.exec.dynamic.partition = true;#hive1版本，先开启动态分区

set hive.exec.dynamic.partition.mode = nonstrict;#hive1版本，再开启非严格模式

2)添加数据

若city是指定的自动分区字段，则select中必须包含city，且在最后一个；
若分区字段是两个，city和age,则partition(city,age),city为主，age为次，select中city,age在最后且顺序不能变；

from stu

insert into table student01 partition(city) select id, sname, age, city

insert into table student02 partition(city) select id, sname, age, city ;

分桶表的加载

不允许使用load方式；

1）先建一个分桶表student；

CREATE TABLE if not exists student(id int ,sname string ,age int, city string) clustered BY (age) sorted BY (city) INTO 3 buckets ROW FORMAT delimited FIELDS terminated BY ','

2）添加数据

insert into table student select * from stu;

添加数据的时候，reducetask 实际运行个数，默认值是1，但是因为分3个桶，因此此时reducetask 实际运行个数=3；
reducetask 最大运行个数是1009；
每一个reduce的吞吐量是256M；

=================================================================

数据的导出

单重

insert overwrite directory '/user/stu01.txt' select * from student where age >=18; #到HDFS

insert overwrite local directory '/user/stu01.txt' select * from student where age >=18; #到本地

多重

from student insert overwrite local directory '/user/stu01.txt' select * where age >=18 insert overwrite local directory '/user/stu01.txt' select * where age >=18;

到HDFS的话，去掉local即可

Hive的数据加载与导出相关推荐

Hive 分区表数据加载
1. Hive表数据的导入方式 1.1 本地上传至hdfs 命令: hdfs dfs -put [文件名] [hdfs绝对路径] 例如:测试文件 test_001.txt 内容如下在 hdfs 绝对 ...
hive 如果表不存在则创建_从零开始学习大数据系列(四十七) Hive中数据的加载与导出...
[本文大约1400字,阅读时间5~10分钟] 在<从零开始学习大数据系列(三十八) Hive中的数据库和表>和<从零开始学习大数据系列(四十二)Hive中的分区>文章中,我们已 ...
hive分区、数据加载、数据导出、数据类型
一.hive分区 1.特点: 分区表与其他表不同点在于,分区字段的值为表目录下的子目录格式 ,为: 分区字段=值 2.建表语句 create database learn2; CREATE TABLE ...
hive外部表改为内部表_3000字揭秘Greenplum的外部数据加载——外部表
外部表是greenplum的一种数据表,它与普通表不同的地方是:外部表是用来访问存储在greenplum数据库之外的数据.如普通表一样,可使用SQL对外部表进行查询和插入操作.外部表主要用于Green ...
Hive的基本操作-表结构修改和数据加载
分桶表将数据按照指定的字段进行分成多个桶中去,说白了就是将数据按照字段进行划分,可以将数据按照字段划分到多个文件当中去开启 Hive 的分桶功能 set hive.enforce.bucketin ...
Spark _25.plus _使用idea读取Hive中的数据加载成DataFrame/DataSet（四）
对Spark _25 _读取Hive中的数据加载成DataFrame/DataSet(四) https://georgedage.blog.csdn.net/article/details/10309 ...
Spark _25 _读取Hive中的数据加载成DataFrame/DataSet（四）
由于Hive不在本地,操作略显麻烦.不过细心一点,分析错误,也还好,如果你搭建的hadoop是HA,需要多注意: 这里指出一个错误,如果你报了同类错误,可以参考:https://georgedage. ...
hive 导入hdfs数据_将数据加载或导入运行在基于HDFS的数据湖之上的Hive表中的另一种方法。
hive 导入hdfs数据 Preceding pen down the article, might want to stretch out appreciation to all the well ...
track_info分区表的创建并将ETL的数据加载到Hive表
文章目录 track_info分区表的创建将ETL的数据加载到Hive表 track_info分区表的创建分区表因为日志是一天一个分区 create external table track_i ...

Hive的数据加载与导出