https://blog.csdn.net/wiborgite/article/details/78813342

背景说明:

基于CHD quick VM环境,在一个VM中同时包含了HDFS、YARN、HBase、Hive、Impala等组件。

本文将一个文本数据从HDFS加载到Hive,同步元数据后,在Impala中进行数据操作。

-----------------------------------------------------------------------------------------Linux Shell的操作-----------------------------------------------------------

1、将PC本地的数据文件上传到VM中/home/data目录下

  1. [root@quickstart data]# pwd
  2. /home/data
  3. [root@quickstart data]# ls
  4. p10pco2a.dat stock_data2.csv
  5. [root@quickstart data]# head p10pco2a.dat
  6. WOCE_P10,1993,279.479,-16.442,172.219,24.9544,34.8887,1.0035,363.551,2
  7. WOCE_P10,1993,279.480,-16.440,172.214,24.9554,34.8873,1.0035,363.736,2
  8. WOCE_P10,1993,279.480,-16.439,172.213,24.9564,34.8868,1.0033,363.585,2
  9. WOCE_P10,1993,279.481,-16.438,172.209,24.9583,34.8859,1.0035,363.459,2
  10. WOCE_P10,1993,279.481,-16.437,172.207,24.9594,34.8859,1.0033,363.543,2
  11. WOCE_P10,1993,279.481,-16.436,172.205,24.9604,34.8858,1.0035,363.432,2
  12. WOCE_P10,1993,279.489,-16.417,172.164,24.9743,34.8867,1.0036,362.967,2
  13. WOCE_P10,1993,279.490,-16.414,172.158,24.9742,34.8859,1.0035,362.960,2
  14. WOCE_P10,1993,279.491,-16.412,172.153,24.9747,34.8864,1.0033,362.998,2
  15. WOCE_P10,1993,279.492,-16.411,172.148,24.9734,34.8868,1.0031,363.022,2

2、将/home/data/p10pco2a.dat文件上传到HDFS

  1. [root@quickstart data]# hdfs dfs -put p10pco2a.dat /tmp/
  2. [root@quickstart data]# hdfs dfs -ls /tmp
  3. -rw-r--r-- 1 root supergroup 281014 2017-12-14 18:47 /tmp/p10pco2a.dat

-----------------------------------------------------------------------Hive的操作----------------------------------------------------------------------------

1、启动Hive CLI

# hive

2、Hive中创建数据库

CREATE DATABASE  weather;

3、Hive中创建表

  1. create table weather.weather_everydate_detail
  2. (
  3. section string,
  4. year bigint,
  5. date double,
  6. latim double,
  7. longit double,
  8. sur_tmp double,
  9. sur_sal double,
  10. atm_per double,
  11. xco2a double,
  12. qf bigint
  13. )
  14. ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

4、将HDFS中的数据加载到已创建的Hive表中

  1. LOAD DATA INPATH '/tmp/p10pco2a.dat' INTO TABLE weather.weather_everydate_detail;
  2. hive> LOAD DATA INPATH '/tmp/p10pco2a.dat' INTO TABLE weather.weather_everydate_detail;
  3. Loading data to table weather.weather_everydate_detail
  4. Table weather.weather_everydate_detail stats: [numFiles=1, totalSize=281014]
  5. OK
  6. Time taken: 1.983 seconds

5、查看Hive表确保数据已加载

  1. use weather;
  2. select * from weather.weather_everydate_detail limit 10;
  3. select count(*) from weather.weather_everydate_detail;
  1. hive> select * from weather.weather_everydate_detail limit 10;
  2. OK
  3. WOCE_P10 1993 279.479 -16.442 172.219 24.9544 34.8887 1.0035 363.551 2
  4. WOCE_P10 1993 279.48 -16.44 172.214 24.9554 34.8873 1.0035 363.736 2
  5. WOCE_P10 1993 279.48 -16.439 172.213 24.9564 34.8868 1.0033 363.585 2
  6. WOCE_P10 1993 279.481 -16.438 172.209 24.9583 34.8859 1.0035 363.459 2
  7. WOCE_P10 1993 279.481 -16.437 172.207 24.9594 34.8859 1.0033 363.543 2
  8. WOCE_P10 1993 279.481 -16.436 172.205 24.9604 34.8858 1.0035 363.432 2
  9. WOCE_P10 1993 279.489 -16.417 172.164 24.9743 34.8867 1.0036 362.967 2
  10. WOCE_P10 1993 279.49 -16.414 172.158 24.9742 34.8859 1.0035 362.96 2
  11. WOCE_P10 1993 279.491 -16.412 172.153 24.9747 34.8864 1.0033 362.998 2
  12. WOCE_P10 1993 279.492 -16.411 172.148 24.9734 34.8868 1.0031 363.022 2
  13. Time taken: 0.815 seconds, Fetched: 10 row(s)
  14. hive> select count(*) from weather.weather_everydate_detail;
  15. Query ID = root_20171214185454_c783708d-ad4b-46cc-9341-885c16a286fe
  16. Total jobs = 1
  17. Launching Job 1 out of 1
  18. Number of reduce tasks determined at compile time: 1
  19. In order to change the average load for a reducer (in bytes):
  20. set hive.exec.reducers.bytes.per.reducer=<number>
  21. In order to limit the maximum number of reducers:
  22. set hive.exec.reducers.max=<number>
  23. In order to set a constant number of reducers:
  24. set mapreduce.job.reduces=<number>
  25. Starting Job = job_1512525269046_0001, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1512525269046_0001/
  26. Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1512525269046_0001
  27. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  28. 2017-12-14 18:55:27,386 Stage-1 map = 0%, reduce = 0%
  29. 2017-12-14 18:56:11,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 39.36 sec
  30. 2017-12-14 18:56:18,711 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 41.88 sec
  31. MapReduce Total cumulative CPU time: 41 seconds 880 msec
  32. Ended Job = job_1512525269046_0001
  33. MapReduce Jobs Launched:
  34. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 41.88 sec HDFS Read: 288541 HDFS Write: 5 SUCCESS
  35. Total MapReduce CPU Time Spent: 41 seconds 880 msec
  36. OK
  37. 4018
  38. Time taken: 101.82 seconds, Fetched: 1 row(s)

6、执行一个普通查询:

  1. hive> select * from weather_everydate_detail where sur_sal=34.8105;
  2. OK
  3. WOCE_P10 1993 312.148 34.602 141.951 24.0804 34.8105 1.0081 361.29 2
  4. WOCE_P10 1993 312.155 34.602 141.954 24.0638 34.8105 1.0079 360.386 2
  5. Time taken: 0.138 seconds, Fetched: 2 row(s)
  1. hive> select * from weather_everydate_detail where sur_sal=34.8105;
  2. OK
  3. WOCE_P10 1993 312.148 34.602 141.951 24.0804 34.8105 1.0081 361.29 2
  4. WOCE_P10 1993 312.155 34.602 141.954 24.0638 34.8105 1.0079 360.386 2
  5. Time taken: 1.449 seconds, Fetched: 2 row(s)

-----------------------------------------------------------------------------------------------------Impala的操作-----------------------------------------------------------
1、启动Impala CLI

# impala-shell 

2、在Impala中同步元数据

  1. [quickstart.cloudera:21000] > INVALIDATE METADATA;
  2. Query: invalidate METADATA
  3. Query submitted at: 2017-12-14 19:01:12 (Coordinator: http://quickstart.cloudera:25000)
  4. Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=43460ace5d3a9971:9a50f46600000000
  5. Fetched 0 row(s) in 3.25s

3、在Impala中查看Hive中表的结构

  1. [quickstart.cloudera:21000] > use weather;
  2. Query: use weather
  3. [quickstart.cloudera:21000] > desc weather.weather_everydate_detail;
  4. Query: describe weather.weather_everydate_detail
  5. +---------+--------+---------+
  6. | name | type | comment |
  7. +---------+--------+---------+
  8. | section | string | |
  9. | year | bigint | |
  10. | date | double | |
  11. | latim | double | |
  12. | longit | double | |
  13. | sur_tmp | double | |
  14. | sur_sal | double | |
  15. | atm_per | double | |
  16. | xco2a | double | |
  17. | qf | bigint | |
  18. +---------+--------+---------+
  19. Fetched 10 row(s) in 3.70s

4、查询记录数量

  1. [quickstart.cloudera:21000] > select count(*) from weather.weather_everydate_detail;
  2. Query: select count(*) from weather.weather_everydate_detail
  3. Query submitted at: 2017-12-14 19:03:11 (Coordinator: http://quickstart.cloudera:25000)
  4. Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=5542894eeb80e509:1f9ce37f00000000
  5. +----------+
  6. | count(*) |
  7. +----------+
  8. | 4018 |
  9. +----------+
  10. Fetched 1 row(s) in 2.51s

说明:对比Impala与Hive中的count查询,2.15 VS 101.82,Impala的优势还是相当明显的

5、执行一个普通查询

  1. [quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_sal=34.8105;
  2. Query: select * from weather_everydate_detail where sur_sal=34.8105
  3. Query submitted at: 2017-12-14 19:20:27 (Coordinator: http://quickstart.cloudera:25000)
  4. Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=c14660ed0bda471f:d92fcf0e00000000
  5. +----------+------+---------+--------+---------+---------+---------+---------+---------+----+
  6. | section | year | date | latim | longit | sur_tmp | sur_sal | atm_per | xco2a | qf |
  7. +----------+------+---------+--------+---------+---------+---------+---------+---------+----+
  8. | WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081 | 361.29 | 2 |
  9. | WOCE_P10 | 1993 | 312.155 | 34.602 | 141.954 | 24.0638 | 34.8105 | 1.0079 | 360.386 | 2 |
  10. +----------+------+---------+--------+---------+---------+---------+---------+---------+----+
  11. Fetched 2 row(s) in 0.25s
  1. [quickstart.cloudera:21000] > select * from weather_everydate_detail where sur_tmp=24.0804;
  2. Query: select * from weather_everydate_detail where sur_tmp=24.0804
  3. Query submitted at: 2017-12-14 23:15:32 (Coordinator: http://quickstart.cloudera:25000)
  4. Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=774e2b3b81f4eed7:8952b5b400000000
  5. +----------+------+---------+--------+---------+---------+---------+---------+--------+----+
  6. | section | year | date | latim | longit | sur_tmp | sur_sal | atm_per | xco2a | qf |
  7. +----------+------+---------+--------+---------+---------+---------+---------+--------+----+
  8. | WOCE_P10 | 1993 | 312.148 | 34.602 | 141.951 | 24.0804 | 34.8105 | 1.0081 | 361.29 | 2 |
  9. +----------+------+---------+--------+---------+---------+---------+---------+--------+----+
  10. Fetched 1 row(s) in 3.86s

6.结论

对于Hive中需要编译为mapreduce执行的SQL,在Impala中执行是有明显的速度优势的,但是Hive也不是所有的查询都要编译为mapreduce,此类型的查询,impala相比于Hive就没啥优势了。

转载于:https://www.cnblogs.com/wincai/p/10431165.html

[转]impala操作hive数据实例相关推荐

  1. impala操作hive数据实例

    背景说明: 基于CHD quick VM环境,在一个VM中同时包含了HDFS.YARN.HBase.Hive.Impala等组件. 本文将一个文本数据从HDFS加载到Hive,同步元数据后,在Impa ...

  2. KUDU数据导入尝试一:TextFile数据导入Hive,Hive数据导入KUDU

    背景 SQLSERVER数据库中单表数据几十亿,分区方案也已经无法查询出结果.故:采用导出功能,导出数据到Text文本(文本>40G)中. 因上原因,所以本次的实验样本为:[数据量:61w条,文 ...

  3. python操作hive数据库代码_python导出hive数据表的schema实例代码

    本文研究的主要问题是python语言导出hive数据表的schema,分享了实现代码,具体如下. 为了避免运营提出无穷无尽的查询需求,我们决定将有查询价值的数据从mysql导入hive中,让他们使用H ...

  4. 大数据计算引擎:impala对比hive

    目录 Impala与Hive的异同 数据存储 元数据 SQL解释处理 执行计划: 数据流: 内存使用: 调度: 容错: 适用面: Impala相对于Hive所使用的优化技术 Impala的优缺点 Im ...

  5. hive删除hbase数据_Hive进阶:Hive通过外部表操作Hbase数据

    概述: HBase: 查询效率比较高,常为实时业务提供服务,但是其查询方式比较单一,只能通过row方式get单条数据,或者通过scan加过滤器的方式扫描数据表获取数据. Hive: hive用来存储结 ...

  6. python应用中调用spark_在python中使用pyspark读写Hive数据操作

    1.读Hive表数据 pyspark读取hive数据非常简单,因为它有专门的接口来读取,完全不需要像hbase那样,需要做很多配置,pyspark提供的操作hive的接口,使得程序可以直接使用SQL语 ...

  7. python读取oracle数据到hvie parquet_关于sparksql操作hive,读取本地csv文件并以parquet的形式装入hive中...

    说明:spark版本:2.2.0 hive版本:1.2.1 需求: 有本地csv格式的一个文件,格式为${当天日期}visit.txt,例如20180707visit.txt,现在需要将其通过spar ...

  8. python数据导入hive_Python操作HIve,将数据插入到Mysql

    Python操作HIve,将数据插入到Mysql import sys from hive_service import ThriftHive from hive_service.ttypes imp ...

  9. 三、Hive数据仓库应用之Hive数据操作语言(超详细步骤指导操作,WIN10,VMware Workstation 15.5 PRO,CentOS-6.7)

    Hive远程模式部署参考: 一.Hive数据仓库应用之Hive部署(超详细步骤指导操作,WIN10,VMware Workstation 15.5 PRO,CentOS-6.7) Hive数据定义语言 ...

最新文章

  1. 《操作系统》实验报告——进程管理
  2. react——一个todolist的demo
  3. Windows 下更换pip源为阿里源
  4. HDU1016(DFS)
  5. Spark SQL在100TB上的自适应执行实践
  6. 学习 |《神经网络与深度学习》的讲义
  7. 用记事本开始写自己的第一个WebService
  8. LOJ3119 CTS2019 随机立方体 概率、容斥、二项式反演
  9. 全球芯片供应不足!苹果iPhone生产可能面临中断风险
  10. java 并发线程池的理解和使用
  11. OpenCV之感兴趣区域ROI
  12. rap2检测哪些接口在使用_使用四合一气体检测仪应注意哪些方面?-逸云天
  13. java用户登录进入系统_Java CRM系统用户登录功能实现代码实例
  14. css实现分割线功能,各种各样的分割线(附效果图)
  15. 1425: PIPI的消消乐Ⅴ
  16. C# async / await 任务超时处理
  17. php zend引擎解析原理,PHP内核分析-Zend引擎-栈结构及操作
  18. 困在“墙”里的中年程序员
  19. 计算机组成原理课程设计报告 给出指令执行流程 add(二进制加法),计算机组成原理课程设计...
  20. 【华为OD机试模拟题】用 C++ 实现 - 预订酒店(2023.Q1)

热门文章

  1. 是的,又一次立FLAG了
  2. Android开发笔记(一)手势识别
  3. 时间的正则表达式验证
  4. http,tcp的长连接和短连接
  5. 点餐系统的设计(一)
  6. 为什么有人很容易的考上公务员?有的人考了三年都考不上?
  7. 建议手机电池85%以下去换电池
  8. 营销、销售和运营的区别?
  9. 如果你不够优秀,人脉是不值钱的
  10. 别看微信,微博,头条用户都很多,自媒体作者也很多