五、DML数据操作

5.1 数据导入

5.1.1 向表中装载数据（Load）

1）语法

hive>load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据

（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表

（3）inpath:表示加载数据的路径

（4）overwrite:表示覆盖表中已有数据，否则表示追加

（5）into table:表示加载到哪张表

（6）student:表示具体的表名

（7）partition:表示上传到指定分区

2）实操案例

（0）创建一张表hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';（1）加载本地文件到hivehive (default)> load data local inpath '/opt/module/datas/student.txt' into table default.student;（2）加载HDFS文件到hive中上传文件到HDFShive (default)> dfs -put /opt/module/datas/student.txt /user/itstar/hive;加载HDFS上数据hive (default)>load data inpath '/user/itstar/hive/student.txt' into table default.student;（3）加载数据覆盖表中已有的数据上传文件到HDFShive (default)> dfs -put /opt/module/datas/student.txt /user/itstar/hive;加载数据覆盖表中已有的数据hive (default)>load data inpath '/user/itstar/hive/student.txt' overwrite into table default.student;

注：load hdfs的数据相当于mv文件到另一个目录中，原目录文件消失

5.1.2 通过查询语句向表中插入数据（Insert）

1）创建一张分区表hive (default)> create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';2）基本插入数据hive (default)> insert into table student partition(month='201809')  values(1,'wangwu');  3）基本模式插入（根据单张表查询结果）hive (default)> insert overwrite table student partition(month='201808')select id, name from student where month='201809';4）多插入模式（根据多张表查询结果）hive (default)> from studentinsert overwrite table student partition(month='201807')select id, name where month='201809'insert overwrite table student partition(month='201806')select id, name where month='201809';

5.1.3 查询语句中创建表并加载数据（As Select）

详见4.5.1章创建表。

根据查询结果创建表（查询的结果会添加到新创建的表中）

 create table if not exists student3  as select id, name  from student;

5.1.4 创建表时通过Location指定加载数据路径

1）创建表，并指定在hdfs上的位置hive (default)> create table if not exists student5(id int, name string)row format delimited fields terminated by '\t'location '/user/hive/warehouse/student5';2）上传数据到hdfs上hive (default)> dfs -put /opt/module/datas/student.txt /user/hive/warehouse/student5;3）查询数据hive (default)> select * from student5;

5.1.5 Import数据到指定Hive表中

注意：先用export导出后，再将数据导入。同在HDFS上是Copy级操作

  hive (default)> export table default.student to  '/user/hive/warehouse/export/student';

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Fe1fUfcm-1607930911177)(file:///C:/Users/18451/AppData/Local/Temp/msohtmlclip1/01/clip_image002.gif)]

  hive (default)> import table student2  partition(month='201809') from '/user/hive/warehouse/export/student';

5.2 数据导出

5.2.1 Insert导出

1）将查询的结果导出到本地,数据之间无间隔

 hive (default)> insert overwrite local directory  '/opt/module/datas/export/student'        select *  from student;

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SXqqrtaX-1607930911180)(file:///C:/Users/18451/AppData/Local/Temp/msohtmlclip1/01/clip_image004.gif)]

2）将查询的结果格式化导出到本地,数据之间"\t"间隔

hive (default)> insert overwrite local directory '/opt/module/datas/export/student1'ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             select * from student;

3）将查询的结果导出到HDFS上(没有local)

hive (default)> insert overwrite directory '/user/itstar/student2'ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;

注：虽然同是HDFS，但不是copy操作

5.2.2 Hadoop命令导出到本地

 hive (default)> dfs -get /user/hive/warehouse/student/month=201809/000000_0 /opt/module/datas/export/student3.txt;

5.2.3 Hive Shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

$ bin/hive -e 'select *  from default.student;' > /opt/module/datas/export/student4.txt;

5.2.4 Export导出到HDFS上

 hive (default)> export table default.student to  '/user/hive/warehouse/export/student';

5.2.5 Sqoop导出

5.3 清除表中数据（Truncate）

注意：Truncate只能删除管理表，不能删除外部表中数据

hive (default)> truncate table student;

【hadoop生态之Hive】Hive的DML数据操纵语言【笔记+代码】相关推荐

【hadoop生态之Hbase】HBASE的优化【笔记+代码】
三.HBase的优化 3.1.高可用在HBase中Hmaster负责监控RegionServer的生命周期,均衡RegionServer的负载,如果Hmaster挂掉了,那么整个HBase集群将陷入 ...
个人笔记：数据库——第三章第三部分 DML 数据操纵语言
本文仅供参考学习使用,谢谢 SQL语言种类操作 DDL 数据定义语言 create(创建) drop(删除) alte(修改) DQL 数据查询语言 select(查询) DML 数据操纵语言 in ...
探秘Hadoop生态6：Hive技术初探与实践入门
数据仓库_总结一,数据类型数据储存库将包括关系数据库.数据仓库.事务数据库.高级数据库系统.一般文件.数据流和万维网.高级数据库系统包括对象-关系数据库和面向特殊应用的数据库,如空间数据库.时间序 ...
六、MySQL DML数据操纵语言学习笔记（插入、修改、删除详解 + 强化复习）
DML语言数据操作语言: 插入:insert 修改:update 删除:delete 一.插入语句 (1)方式一:经典的插入方式语法: insert into 表名(列名,-)values(值1, ...
DML 数据操纵语言
1.INSERT(插入)语言结构 INSERT INTO table(表名)(要插入的列名) VALUES(要插入的具体值): table:要插入数据的表的表名 column[,column]:表中要 ...
Hadoop 生态系列之 Mapreduce
阅读文本大概需要 5 分钟.文章稍长,建议收藏慢慢看. 目前 Hadoop 系列文章的规划就是这样,持续补充完善中- 同时可以访问 https://data.cuteximi.com Hadoop 生 ...
Hadoop 生态系列之 HDFS
目前 Hadoop 系列文章的规划就是这样,持续补充完善中... 同时可以访问 :https://data.cuteximi.com Hadoop 生态系列之1.0和2.0架构 Hadoop 生态系列 ...
Hadoop 生态系列之 1.0 和 2.0 架构
自学大数据有一段时间了,找工作历时一周,找到一家大厂,下周入职,薪资待遇还不错,公司的业务背景自己也很喜欢.趁着还没有入职,给大家争取先把 Hadoop 系列的文章总结完毕,可以当做科普文,也可以当做 ...
【Hadoop】四、Hadoop生态综合案例 ——陌陌聊天数据分析
文章目录四.Hadoop生态综合案例 --陌陌聊天数据分析 1.陌陌聊天数据分析案例需求 1.1.背景介绍 1.2.目标需求 1.3.数据内容 2.基于Hive数仓实现需求开发 2.1.建库建表.加 ...

【hadoop生态之Hive】Hive的DML数据操纵语言【笔记+代码】