Hive作为大数据环境下的数据仓库工具,支持基于hadoop以sql的方式执行mapreduce的任务,非常适合对大量的数据进行全量的查询分析。

本文主要讲述下hive载cli中如何导入导出数据:

导入数据

第一种方式,直接从本地文件系统导入数据

我的本机有一个test1.txt文件,这个文件中有三列数据,并且每列都是以'\t'为分隔

[root@localhost conf]# cat /usr/tmp/test1.txt
1   a1  b1
2   a2  b2
3   a3  b3
4   a4  b

创建数据表:

>create table test1(a string,b string,c string)
>row format delimited
>fields terminated by '\t'
>stored as textfile;

导入数据:

load data local inpath '/usr/tmp/test1.txt' overwrite into table test1;

其中local inpath,表明路径为本机路径
overwrite表示加载的数据会覆盖原来的内容

第二种,从hdfs文件中导入数据

首先上传数据到hdfs中

hadoop fs -put /usr/tmp/test1.txt /test1.txt

在hive中查看test1.txt文件

hive> dfs -cat /test1.txt;
1   a1  b1
2   a2  b2
3   a3  b3
4   a4  b4

创建数据表,与前面一样。导入数据的命令有些差异:

load data inpath '/test1.txt' overwrite into table test2;

第三种,基于查询insert into导入

首先定义数据表,这里直接创建带有分区的表

hive> create table test3(a string,b string,c string) partitioned by (d string) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.109 seconds
hive> describe test3;
OK
a                       string
b                       string
c                       string
d                       string                                      # Partition Information
# col_name              data_type               comment             d                       string
Time taken: 0.071 seconds, Fetched: 9 row(s)

通过查询直接导入数据到固定的分区表中:

hive> insert into table test3 partition(d='aaaaaa') select * from test2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20160823212718_9cfdbea4-42fa-4267-ac46-9ac2c357f944
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-08-23 21:27:21,621 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1550375778_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:8020/user/hive/warehouse/test.db/test3/d=aaaaaa/.hive-staging_hive_2016-08-23_21-27-18_739_4058721562930266873-1/-ext-10000
Loading data to table test.test3 partition (d=aaaaaa)
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 248 HDFS Write: 175 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 3.647 seconds

通过查询观察结果

hive> select * from test3;
OK
1   a1  b1  aaaaaa
2   a2  b2  aaaaaa
3   a3  b3  aaaaaa
4   a4  b4  aaaaaa
Time taken: 0.264 seconds, Fetched: 4 row(s)

PS:也可以直接通过动态分区插入数据:

insert into table test4 partition(c) select * from test2;

分区会以文件夹命名的方式存储:

hive> dfs -ls /user/hive/warehouse/test.db/test4/;
Found 4 items
drwxr-xr-x   - root supergroup          0 2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b1
drwxr-xr-x   - root supergroup          0 2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b2
drwxr-xr-x   - root supergroup          0 2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b3
drwxr-xr-x   - root supergroup          0 2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b4

第四种,直接基于查询创建数据表

直接通过查询创建数据表:

hive> create table test5 as select * from test4;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20160823213944_03672168-bc56-43d7-aefb-cac03a6558bf
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-08-23 21:39:46,030 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local855333165_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:8020/user/hive/warehouse/test.db/.hive-staging_hive_2016-08-23_21-39-44_259_5484795730585321098-1/-ext-10002
Moving data to directory hdfs://localhost:8020/user/hive/warehouse/test.db/test5
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 600 HDFS Write: 466 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.184 seconds

查看结果

hive> select * from test5;
OK
1   a1  b1
2   a2  b2
3   a3  b3
4   a4  b4
Time taken: 0.147 seconds, Fetched: 4 row(s)

导出数据

导出到本地文件

执行导出本地文件命令:

hive> insert overwrite local directory '/usr/tmp/export' select * from test1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20160823221655_05b05863-6273-4bdd-aad2-e80d4982425d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-08-23 22:16:57,028 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local8632460_0005
Moving data to local directory /usr/tmp/export
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 794 HDFS Write: 498 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.569 seconds
hive>

在本地文件查看内容:

[root@localhost export]# ll
total 4
-rw-r--r--. 1 root root 32 Aug 23 22:16 000000_0
[root@localhost export]# cat 000000_0
1a1b1
2a2b2
3a3b3
4a4b4
[root@localhost export]# pwd
/usr/tmp/export
[root@localhost export]#

导出到hdfs

hive> insert overwrite directory '/usr/tmp/test' select * from test1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20160823214217_e8c71bb9-a147-4518-8353-81f9adc54183
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-08-23 21:42:19,257 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local628523792_0004
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:8020/usr/tmp/test/.hive-staging_hive_2016-08-23_21-42-17_778_6818164305996247644-1/-ext-10000
Moving data to directory /usr/tmp/test
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 730 HDFS Write: 498 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.594 seconds

导出成功,查看导出的hdfs文件

hive> dfs -cat /usr/tmp/test;
cat: `/usr/tmp/test': Is a directory
Command failed with exit code = 1
Query returned non-zero code: 1, cause: null
hive> dfs -ls /usr/tmp/test;
Found 1 items
-rwxr-xr-x   3 root supergroup         32 2016-08-23 21:42 /usr/tmp/test/000000_0hive> dfs -cat /usr/tmp/test/000000_0;
1a1b1
2a2b2
3a3b3
4a4b4
hive>

导出到另一个表

样例可以参考前面数据导入的部分:

insert into table test3 select * from test1;

转载于:https://www.cnblogs.com/xing901022/p/5801061.html

[Hadoop大数据]——Hive数据的导入导出相关推荐

  1. toad导入数据_利用TOAD实现EXCEL数据在oracle的导入导出

    利用TOAD实现EXCEL数据在oracle的导入导出 1.从ORACLE数据库导出成为EXCEL文件 利用TOAD连接上数据库,访问某个表,我本机是选中表"OA_USER" 右键 ...

  2. oracle导出自增设置,oracle008:oracle自增,自适应,数据闪回,导入导出

    oracle008:oracle自增,自适应,数据闪回,导入导出 oracle008:oracle自增,自适应,数据闪回,导入导出 一,oracle数据自增 在MySQL中自增使用主键和自增来实现,但 ...

  3. HBase数据备份及恢复(导入导出)的常用方法

    一.说明 随着HBase在重要的商业系统中应用的大量增加,许多企业需要通过对它们的HBase集群建立健壮的备份和故障恢复机制来保证它们的企业(数据)资产.备份Hbase时的难点是其待备份的数据集可能非 ...

  4. 大数据Hive数据定义语言DDL

    目录 1 数据定义语言(DDL)概述 1.1 DDL语法的作用 1.2 Hive中DDL使用 2 Hive DDL建表基础 2.1 完整建表语法树 3 Hive数据类型详解 3.1 整体概述 3.2 ...

  5. python导入excel数据-Python数据处理之导入导出excel数据

    欢迎点击上方"AntDream"关注我 .Python的一大应用就是数据分析了,而数据分析中,经常碰到需要处理Excel数据的情况.这里做一个Python处理Excel数据的总结, ...

  6. oracle数据库导出数据6,Oracle数据库导入导出方法汇总

    Oracle数据库导入导出方法: 1.使用命令行: 数据导出: 1.将数据库TEST完全导出,用户名system密码manager导出到D:\daochu.dmp中 exp system/manage ...

  7. 轻松实现SQL Server与Access、Excel数据表间的导入导出

    在SQL SERVER 2000/2005中除了使用DTS进行数据的导入导出,我们也可以使用Transact-SQL语句进行导入导出操作.在Transact-SQL语句中,我们主要使用OpenData ...

  8. 谈笑间学会大数据-Hive数据定义

    Hive数据定义 目录 前言 Hive中的数据库 创建一个数据库 指定数据库存储路径 数据库操作及表操作 hive中的表 创建表 copy表结构 内部表 外部表 分区表 外部分区表 分区增删修改操作 ...

  9. solr 数据备份还原,导入导出

    文章目录 1, solr版本相同: backup,restore 2, solr版本不同:导出/导入json 1, solr版本相同: backup,restore a, solr stand-alo ...

最新文章

  1. 李昱:百度产品登录协议介绍
  2. C#操作Excel,权限问题
  3. linux系统监控:记录用户操作轨迹,谁动过服务器
  4. 如何在VS一个工程里面测试不同代码?(创建不同项目,并将需要运行的项目设为启动项目)
  5. VTK:小部件之CheckerboardWidget
  6. java 验证码_java学习之web基础(6):使用Response的输出流在页面输出验证码
  7. NET问答: Log4Net 无法将日志写入到 log 文件的求助.....
  8. Netflix Archaius用于物业管理–基础知识
  9. 不买iPhone11的四大理由,最后一个扎心了
  10. 牛X网整理的JAVA面试题
  11. 北京楼市前十个月少卖832亿元 销售创6年新低
  12. 蓝桥杯 ALGO-157 算法训练 阶乘末尾
  13. 「Linux」Linux下根据CET听力文件关键字和lcr时间对mp3进行剪辑分割
  14. 通过apache对页面进行压缩和页面缓存来提升性能
  15. 打印机服务器启用后自动关闭,打印机print spooler服务启动后总是自动停止的解决方法(没测试)...
  16. 怎么用手机当电脑摄像头?安卓苹果都可以,巨简单的N种方案任君挑选
  17. UML类图画法全程解析
  18. 手把手教你实现echarts3的折线图下钻drilldown功能系列篇二
  19. mailgun_用Mailgun邮寄出去!
  20. RT-Thread串口设备驱动框架

热门文章

  1. python na不显示 占位_Python学习之路—Python基础(一)
  2. Titanium快速开发app
  3. Ubuntu18系统安装使用Nginx
  4. 用户登录自动注销问题
  5. 采药问题 c语言程序,采药问题(动态规划)
  6. ASP.NET MVC——Entity Framework连接mysql及问题
  7. Arcgis Javascript那些事儿(一)--Arcgis server发布feature access服务
  8. 浅析 golang module
  9. Dialog高仿Toast实现
  10. 几种常见的可靠UDP传输协议(包含C#实现)