Hive数据导入导出的几种方式
一,Hive数据导入的几种方式
首先列出讲述下面几种导入方式的数据和hive表。

导入:

本地文件导入到Hive表;
Hive表导入到Hive表;
HDFS文件导入到Hive表;
创建表的过程中从其他表导入;
通过sqoop将mysql库导入到Hive表;示例见《通过sqoop进行mysql与hive的导入导出》和《定时从大数据平台同步HIVE数据到oracle》
导出:

Hive表导出到本地文件系统;
Hive表导出到HDFS;
通过sqoop将Hive表导出到mysql库;
Hive表:

创建testA:

CREATE TABLE testA (  id INT,  name string,  area string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

创建testB:

CREATE TABLE testB (  id INT,  name string,  area string,  code string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

数据文件(sourceA.txt):

1,fish1,SZ
2,fish2,SH
3,fish3,HZ
4,fish4,QD
5,fish5,SR

数据文件(sourceB.txt):

1,zy1,SZ,1001
2,zy2,SH,1002
3,zy3,HZ,1003
4,zy4,QD,1004
5,zy5,SR,1005

(1)本地文件导入到Hive表

hive> LOAD DATA LOCAL INPATH '/home/hadoop/sourceA.txt' INTO TABLE testA PARTITION(create_time='2015-07-08');
Copying data from file:/home/hadoop/sourceA.txt
Copying file: file:/home/hadoop/sourceA.txt
Loading data to table default.testa partition (create_time=2015-07-08)
Partition default.testa{create_time=2015-07-08} stats: [numFiles=1, numRows=0, totalSize=58, rawDataSize=0]
OK
Time taken: 0.237 seconds
hive> LOAD DATA LOCAL INPATH '/home/hadoop/sourceB.txt' INTO TABLE testB PARTITION(create_time='2015-07-09');
Copying data from file:/home/hadoop/sourceB.txt
Copying file: file:/home/hadoop/sourceB.txt
Loading data to table default.testb partition (create_time=2015-07-09)
Partition default.testb{create_time=2015-07-09} stats: [numFiles=1, numRows=0, totalSize=73, rawDataSize=0]
OK
Time taken: 0.212 seconds
hive> select * from testA;
OK
1   fish1   SZ  2015-07-08
2   fish2   SH  2015-07-08
3   fish3   HZ  2015-07-08
4   fish4   QD  2015-07-08
5   fish5   SR  2015-07-08
Time taken: 0.029 seconds, Fetched: 5 row(s)
hive> select * from testB;
OK
1   zy1 SZ  1001    2015-07-09
2   zy2 SH  1002    2015-07-09
3   zy3 HZ  1003    2015-07-09
4   zy4 QD  1004    2015-07-09
5   zy5 SR  1005    2015-07-09
Time taken: 0.047 seconds, Fetched: 5 row(s)

(2)Hive表导入到Hive表

将testB的数据导入到testA表

hive> INSERT INTO TABLE testA PARTITION(create_time='2015-07-11') select id, name, area from testB where id = 1;
...
OK
Time taken: 14.744 seconds
hive> INSERT INTO TABLE testA PARTITION(create_time) select id, name, area, code from testB where id = 2;
<pre name="code" class="java">...(省略)
OKTime taken: 19.852 secondshive> select * from testA;OK2 zy2 SH 10021 fish1 SZ 2015-07-082 fish2 SH 2015-07-083 fish3 HZ 2015-07-084 fish4 QD 2015-07-085 fish5 SR 2015-07-081 zy1 SZ 2015-07-11Time taken: 0.032 seconds, Fetched: 7 row(s)

说明:

1,将testB中id=1的行,导入到testA,分区为2015-07-11

2,将testB中id=2的行,导入到testA,分区create_time为id=2行的code值。

(3)HDFS文件导入到Hive表

将sourceA.txt和sourceB.txt传到HDFS中,路径分别是/home/hadoop/sourceA.txt和/home/hadoop/sourceB.txt中

hive> LOAD DATA INPATH '/home/hadoop/sourceA.txt' INTO TABLE testA PARTITION(create_time='2015-07-08');
...(省略)
OK
Time taken: 0.237 seconds
hive> LOAD DATA INPATH '/home/hadoop/sourceB.txt' INTO TABLE testB PARTITION(create_time='2015-07-09');
<pre name="code" class="java">...(省略)
OK
Time taken: 0.212 seconds
hive> select * from testA;
OK
1   fish1   SZ  2015-07-08
2   fish2   SH  2015-07-08
3   fish3   HZ  2015-07-08
4   fish4   QD  2015-07-08
5   fish5   SR  2015-07-08
Time taken: 0.029 seconds, Fetched: 5 row(s)
hive> select * from testB;
OK
1   zy1 SZ  1001    2015-07-09
2   zy2 SH  1002    2015-07-09
3   zy3 HZ  1003    2015-07-09
4   zy4 QD  1004    2015-07-09
5   zy5 SR  1005    2015-07-09
Time taken: 0.047 seconds, Fetched: 5 row(s)  /home/hadoop/sourceA.txt'导入到testA表/home/hadoop/sourceB.txt'导入到testB表

(4)创建表的过程中从其他表导入

复制代码

hive> create table testC as select name, code from testB;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1449746265797_0106, Tracking URL = http://hadoopcluster79:8088/proxy/application_1449746265797_0106/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job  -kill job_1449746265797_0106
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-24 16:40:17,981 Stage-1 map = 0%,  reduce = 0%
2015-12-24 16:40:23,115 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.11 sec
MapReduce Total cumulative CPU time: 1 seconds 110 msec
Ended Job = job_1449746265797_0106
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop2cluster/tmp/hive-root/hive_2015-12-24_16-40-09_983_6048680148773453194-1/-ext-10001
Moving data to: hdfs://hadoop2cluster/home/hadoop/hivedata/warehouse/testc
Table default.testc stats: [numFiles=1, numRows=0, totalSize=45, rawDataSize=0]
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.11 sec   HDFS Read: 297 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 110 msec
OK
Time taken: 14.292 seconds
hive> desc testC;
OK
name                    string
code                    string
Time taken: 0.032 seconds, Fetched: 2 row(s)

二、Hive数据导出的几种方式
(1)导出到本地文件系统

hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/output' ROW FORMAT DELIMITED FIELDS TERMINATED by ',' select * from testA;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1451024007879_0001, Tracking URL = http://hadoopcluster79:8088/proxy/application_1451024007879_0001/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job  -kill job_1451024007879_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-25 17:04:30,447 Stage-1 map = 0%,  reduce = 0%
2015-12-25 17:04:35,616 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.16 sec
MapReduce Total cumulative CPU time: 1 seconds 160 msec
Ended Job = job_1451024007879_0001
Copying data to local directory /home/hadoop/output
Copying data to local directory /home/hadoop/output
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.16 sec   HDFS Read: 305 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 160 msec
OK
Time taken: 16.701 seconds

查看数据结果:

[hadoop@hadoopcluster78 output]$ cat /home/hadoop/output/000000_0
1,fish1,SZ,2015-07-08
2,fish2,SH,2015-07-08
3,fish3,HZ,2015-07-08
4,fish4,QD,2015-07-08
5,fish5,SR,2015-07-08

通过INSERT OVERWRITE LOCAL DIRECTORY将hive表testA数据导入到/home/hadoop目录,众所周知,HQL会启动Mapreduce完成,其实/home/hadoop就是Mapreduce输出路径,产生的结果存放在文件名为:000000_0。

(2)导出到HDFS

导入到HDFS和导入本地文件类似,去掉HQL语句的LOCAL就可以了

hive> INSERT OVERWRITE DIRECTORY '/home/hadoop/output' select * from testA;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1451024007879_0002, Tracking URL = http://hadoopcluster79:8088/proxy/application_1451024007879_0002/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job  -kill job_1451024007879_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-25 17:08:51,034 Stage-1 map = 0%,  reduce = 0%
2015-12-25 17:08:59,313 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.4 sec
MapReduce Total cumulative CPU time: 1 seconds 400 msec
Ended Job = job_1451024007879_0002
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://hadoop2cluster/home/hadoop/hivedata/hive-hadoop/hive_2015-12-25_17-08-43_733_1768532778392261937-1/-ext-10000
Moving data to: /home/hadoop/output
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.4 sec   HDFS Read: 305 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 400 msec
OK
Time taken: 16.667 seconds

查看hfds输出文件:

[hadoop@hadoopcluster78 bin]$ ./hadoop fs -cat /home/hadoop/output/000000_0
1fish1SZ2015-07-08
2fish2SH2015-07-08
3fish3HZ2015-07-08
4fish4QD2015-07-08
5fish5SR2015-07-08

其他
采用hive的-e和-f参数来导出数据。

参数为: -e 的使用方式,后面接SQL语句。>>后面为输出文件路径

[hadoop@hadoopcluster78 bin]$ ./hive -e "select * from testA" >> /home/hadoop/output/testA.txt
15/12/25 17:15:07 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead  Logging initialized using configuration in file:/home/hadoop/apache/hive-0.13.1/conf/hive-log4j.properties
OK
Time taken: 1.128 seconds, Fetched: 5 row(s)
[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/testA.txt
1   fish1   SZ  2015-07-08
2   fish2   SH  2015-07-08
3   fish3   HZ  2015-07-08
4   fish4   QD  2015-07-08
5   fish5   SR  2015-07-08

复制代码
参数为: -f 的使用方式,后面接存放sql语句的文件。>>后面为输出文件路径

SQL语句文件:

[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/sql.sql
select * from testA
使用-f参数执行:[hadoop@hadoopcluster78 bin]$ ./hive -f /home/hadoop/output/sql.sql >> /home/hadoop/output/testB.txt
15/12/25 17:20:52 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead  Logging initialized using configuration in file:/home/hadoop/apache/hive-0.13.1/conf/hive-log4j.properties
OK
Time taken: 1.1 seconds, Fetched: 5 row(s)

参看结果:

[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/testB.txt
1   fish1   SZ  2015-07-08
2   fish2   SH  2015-07-08
3   fish3   HZ  2015-07-08
4   fish4   QD  2015-07-08
5   fish5   SR  2015-07-08

hive数据导入导出相关推荐

  1. 利用sqoop将hive数据导入导出数据到mysql

    http://niuzhenxin.iteye.com/blog/1726414 运行环境  centos 5.6   hadoop  hive sqoop是让hadoop技术支持的clouder公司 ...

  2. Hive 实战(1)--hive数据导入/导出基础

    前沿: Hive也采用类SQL的语法, 但其作为数据仓库, 与面向OLTP的传统关系型数据库(Mysql/Oracle)有着天然的差别. 它用于离线的数据计算分析, 而不追求高并发/低延时的应用场景. ...

  3. hive常用功能:Hive数据导入导出方式

    作为数据仓库的Hive,存储着海量用户使用的数据.在平常的Hive使用过程中,难免对遇到将外部数据导入到Hive或者将Hive中的数据导出来.今天主要就来学习一下Hive的几种数据导入和导出的方式. ...

  4. hive 数据导入 导出

    目录 hive outline 数据导入hive 本地文件 1. 加载本地文件studet.txt到hive默认数据库student表中(拷贝+追加) 2. 加载本地文件studet.txt到hive ...

  5. 数据工具sqoop用法之mysql与hive数据导入导出

    一.pom org.apache.hive hive-jdbc 1.1.0 org.apache.hadoop hadoop-common 2.6.0 org.mybatis mybatis 3.4. ...

  6. hive:建库建表、表分区、内部表外部表、数据导入导出

    hive建库建表与数据导入 建库 hive中有一个默认的库: 库名: default 库目录:hdfs://hdp20-01:9000/user/hive/warehouse 新建库: create  ...

  7. KUDU数据导入尝试一:TextFile数据导入Hive,Hive数据导入KUDU

    背景 SQLSERVER数据库中单表数据几十亿,分区方案也已经无法查询出结果.故:采用导出功能,导出数据到Text文本(文本>40G)中. 因上原因,所以本次的实验样本为:[数据量:61w条,文 ...

  8. ETL数据导入/导出工具 HData

    HData是一个异构的ETL数据导入/导出工具,致力于使用一个工具解决不同数据源(JDBC.Hive.HDFS.HBase.MongoDB.FTP.Http.CSV.Excel.Kafka等)之间数据 ...

  9. 通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出

    目录 目录 1.什么是 Sqoop? 2.下载应用程序及配置环境变量 2.1.下载 Sqoop 1.4.7 2.2.设置环境变量 2.3.设置安装所需环境 3.安装 Sqoop 1.4.7 3.1.修 ...

最新文章

  1. CMOS及CCD感光sensor的主要技术参数解析
  2. mysql获取查询策略语句_MySQL数据库查询性能优化策略
  3. 七、有机硅柔软剂在不同发展阶段分子结构特征及主要解决的问题?
  4. 前端学习(2512):组件注册
  5. 【hihocoder 1499】A Box of Coins
  6. 【网络安全入门大总结】—Java语言中常用的渗透漏洞大汇总
  7. 王秋杨的“前世”和她的“在路上”
  8. celery cluser redis_celery结合redis 使用
  9. Mysql笔记——DCL
  10. One Switch for Mac(系统功能快速切换工具)
  11. lwj_C#_work 字符串、类的使用和数学运算
  12. 强化学习算法TRPO之共轭梯度优化
  13. ROS2 中的 launch 文件入门的 6 个疑问
  14. 手机查看企业qq邮件服务器,QQ企业邮箱怎么用?手机QQ邮箱收发邮件的方法
  15. 华为云教程(云备份CBR)
  16. 苹果开发者账号购买流程
  17. Windows自带mstsc远程无法关机重启小技巧
  18. 如何批量压缩pdf文件到最小
  19. zeros什么意思_matlab中zeros函数是什么含义?MATLAB中zeros表示表示什么意思
  20. html消除图之间间距,html表格间距怎么取消 html 表格行之间的间距怎么设置

热门文章

  1. 北大30岁女博导获2019 IEEE青年成就奖,全球仅三人,深耕微纳电子、神经形态计算...
  2. 90%的人会遇到性能问题,如何用1行代码快速定位?
  3. 统计学习三要素 模型+策略+算法
  4. 用上Pytorch Lightning的这六招,深度学习pipeline提速10倍!
  5. 《动手学深度学习》中文第二版预览版发布
  6. 985博士分享Pytorch与Tensorflow,哪个更值得你学习?
  7. 目前最好用的大规模强化学习算法训练库是什么?
  8. 北大数学天才柳智宇:为何放弃麻省理工奖学金,选择出家为僧?
  9. 别上当!这些都是5G假项目!
  10. 网络工程师_想要记录下来的一些题_1