hive系列-hive教程

hive教程

备注：hive本身是不存储数据的，它可以把Mysql、Hdfs、Hbase等当然数据源，然后做数据处理，表面上，Hive用的是SQL，但是Hive的底层是MapReduce，它只是一个把SQL转变成MapReduce的工具，当然，有些没有查询条件的查询它只是做了数据的映射（并不执行MapReduce），这也是它做的优化。

1、建表

默认是内部表

create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

建分区表

create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';

建外部表

create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';

2、普通表和分区表区别：有大量数据增加的需要建分区表

create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';

3、内部表和外部表的区别

a、内部表一般是原来没有数据，需要自己加载数据，而且删除表时，数据会被一起删除

b、外部表一般是已经有了业务数据的，而且存在别的系统已经在使用这些数据，删除表时，数据不会被删除

4、加载数据

a、普通表数据加载

load data local inpath '/home/hadoop/ip.txt' into table tab_ext;

b、分区表数据加载

load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

load data local inpath '/root/data.am' into table beauty partition (nation="USA");

5、查询数据

a、分区表查询所有数据

select * from tab_ip_part;

b、分区表查询分区为part2的数据

select * from tab_ip_part where part_flag='part2';

6、更新表

alter table tab_ip change id id_alter string;

alter table tab_cts add partition (partCol = 'dt') location '/external/hive/dt';

show partitions tab_ip_part;

7、数组类型（array）

create table tab_array(a array<int>,b array<string>)

row format delimited

fields terminated by '\t'

collection items terminated by ',';

示例数据

tobenbrone,laihama,woshishui 13866987898,13287654321

abc,iloveyou,itcast 13866987898,13287654321

select a[0] from tab_array;

select * from tab_array where array_contains(b,'word');

insert into table tab_array select array(0),array(name,ip) from tab_ext t;

8、map类型

create table tab_map(name string,info map<string,string>)

row format delimited

fields terminated by '\t'

collection items terminated by ';'

map keys terminated by ':';

示例数据：

fengjie age:18;size:36A;addr:usa

furong age:28;size:39C;addr:beijing;weight:180KG

9、insert from select 用于向临时表中追加中间结果数据

create table tab_ip_like like tab_ip;

insert overwrite table tab_ip_like select * from tab_ip;

10、写入hdfs

insert overwrite local directory '/home/hadoop/hivetemp/test.txt' select * from tab_ip_part where part_flag='part1';

insert overwrite directory '/hiveout.txt' select * from tab_ip_part where part_flag='part1';

11、CLUSTER（桶，均衡的把数据插入文件中），相对高级一点，你可以放在有精力的时候才去学习，用于做抽样分析等

create table tab_ip_cluster(id int,name string,ip string,country string) clustered by(id) into 3 buckets;

load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_cluster;

set hive.enforce.bucketing=true;

insert into table tab_ip_cluster select * from tab_ip;

select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);

12、自定义函数，UDF

a、java类要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate

b、添加jar包（把jar包打包好上传到hive的机子上，在hive命令行里面执行）

hive> add jar /root/NUDF.jar;

c、创建临时函数

hive> create temporary function getNation as 'cn.itcast.hive.udf.NationUDF';

d、调用

hive> select id, name, getNation(nation) from beauty;

注意：

1、hdfs的mapreduce如果是用hdfs用户来启动任务的，那么启动hive也要切换到hdfs，不然会出现权限的错误

hive系列-hive教程相关推荐

《Hive系列》Hive详细入门教程
目录 1 Hive基本概念 1.1 什么是Hive Hive简介 Hive:由FaceBook开源用于解决海量结构化日志的数据统计工具 Hive:基于Hadoop的一个数据仓库工具,可以将结构化的数据 ...
Hive系列（一）metastore的认证和授权
2019独角兽企业重金招聘Python工程师标准>>> 1 Hive系列目录 Metastore认证和验权 HiveServer2认证和验权 sentry的权限设计 HiveServ ...
Hive系列 (一)：Hive搭建
文章目录 Hive系列文章一.环境信息二.hive下载安装三.mysql下载安装及配置四.Hive配置五.启动服务六.beeline配置七.beeline一键启动脚本八.注意事项 Hi ...
菜鸟先飞之初识Hive、安装教程及常见问题
一.初识Hive 1.什么是Hive Hive是基于Haddop的数据仓库工具,可以将结构化的数据文件映射为一张数据库表并提供SQL查询功能,可以将SQL语句转换 MapReduce任务运行. Hiv ...
新的比较详细的hive安装教程，包含：hadoop的安装配置，hive安装详细教程，大数据下hive详细配置和简单使用和hive元数据迁移相关安装配置操作
简介:这是大数据里安装hive比较详细的教程,包括安装hive时hadoop的相关配置,hive安装,hive 之后的mysql安装,hive的元素配置到mysql相关配置,hive常见属性的相关配置 ...
Hive系列 (六)：Hive数据类型转换
文章目录 Hive系列文章数据类型转换 Cast显示转换数据类型转换表日期类型转换说明转换示例 Hive系列文章 Hadoop完全分布式搭建(腾讯云服务器+阿里云服务器) Hive系列 (一) ...
Hive保姆级教程(万字长文，建议收藏)
Hive教程 1.1 HiveHiveHive的介绍 1.2 HiveHiveHive的设计原理 1.3 HiveHiveHive的架构图 1.4 HiveHiveHive的手动安装部署与配置 1.5 ...
Hive系列（四）函数
Hive函数一.常用内置函数内置函数查看关系运算数学运算逻辑运算数值计算日期函数条件函数字符串函数集合统计函数二.lateral view 与 explode以及reflect和 ...
1.Hive系列之简介
1. Hive简介 1.1 Hive是什么 Hive是一个基于Hadoop的数据仓库工具,它提供了类似于SQL的查询语言HiveQL,以及用于将查询转换为MapReduce任务的引擎.Hive的主要目 ...
《zw版·Halcon-delphi系列原创教程》 Halcon分类函数011,ocr，字符识别
<zw版·Halcon-delphi系列原创教程> Halcon分类函数011,ocr,字符识别为方便阅读,在不影响说明的前提下,笔者对函数进行了简化: :: 用符号"**&q ...

hive系列-hive教程

hive系列-hive教程相关推荐

最新文章

热门文章