文章目录

Hive编程指南01
- 命令行界面操作
- - 1.Hive中"一次使用"命令
  - 2.从文件中执行Hive查询
  - 3.hiverc文件
  - 4.Hive CLI的其他功能
  - （1）自动补全功能
  - （2）查看操作历史命令
  - 5.执行shell命令
  - 6.在Hive内使用Hadoop的dfs命令
  - 7.Hive脚本中如何进行注释
  - 8.显示字段名称
- 数据类型和文件格式
- - 1.基本数据类型
  - 2.集合数据类型
  - 3.文本文件数据编码
  - - 4.读时模式
- HiveQL：数据定义
- - 1.HQL简介
  - 2.Hive中的数据库
  - 3.修改数据库
  - 4.创建表
  - 5.管理表
  - 6.外部表
  - 7.分区表，管理表
  - 8.外部分区表
  - 9.删除表
  - 10.修改表
  - - （1）表重命名
    - （2）增加，修改和删除表分区
    - （3）修改列信息
    - （4）增加列
    - （5）删除列或者替换列
    - （6）修改表属性
    - （7）修改存储属性

Hive编程指南01

命令行界面操作

1.Hive中"一次使用"命令

（1）用户可能有时期望执行一个或多个查询（使用分号分割），执行完毕后hive CLI立即退出。-e 选项可以实现这样的功能。

[root@hadoop01 ~]# hive -e "SELECT * FROM emp LIMIT 3";Logging initialized using configuration in file:/opt/modules/apache/hive-1.2.1/conf/hive-log4j.properties
OK
emp.empno   emp.ename   emp.job emp.mgr emp.hiredate    emp.sal emp.comm    emp.deptno
1234    JACK    CLERK   7856    1998-2-23   1300.0  NULL    50
7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
Time taken: 2.014 seconds, Fetched: 3 row(s)
[root@hadoop01 ~]#

（2）-S 选项可以开启静默模式，这样可以在输出结果中去掉"OK" 和 “Time taken” 等行，以及其他一些无关紧要的输出信息。

[root@hadoop01 ~]# hive -S -e "SELECT * FROM emp LIMIT 5";
emp.empno   emp.ename   emp.job emp.mgr emp.hiredate    emp.sal emp.comm    emp.deptno
1234    JACK    CLERK   7856    1998-2-23   1300.0  NULL    50
7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20
[root@hadoop01 ~]# hive -S -e "SELECT * FROM emp LIMIT 5" >> /root/temp.txt
[root@hadoop01 ~]# cat temp.txt
emp.empno   emp.ename   emp.job emp.mgr emp.hiredate    emp.sal emp.comm    emp.deptno
1234    JACK    CLERK   7856    1998-2-23   1300.0  NULL    50
7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20

（3）获取set 属性值与set 属性名小技巧

[root@hadoop01 ~]# hive -S -e "set" | grep warehouse
hive.metastore.warehouse.dir=/user/hive/warehouse
hive.warehouse.subdir.inherit.perms=true

2.从文件中执行Hive查询

（1）Hive中可以使用 -f 文件名方式执行指定文件中的一个或多个查询语句，按照惯例，一般把这些Hive查询文件保存为具有.q或者.hql后缀名的文件

[root@hadoop01 ~]# cat test.hql
INSERT INTO TABLE emp VALUES(7564,'NIKE','SALESMAN',7432,'1999-12-20',1500.00,600.00,30);
INSERT INTO TABLE emp VALUES(7897,'JACKSON','SALESMAN',7750,'1995-4-24',13500.00,500.00,30);
SELECT * FROM emp;
[root@hadoop01 ~]# hive -S -f /root/test.hql
_col0   _col1   _col2   _col3   _col4   _col5   _col6   _col7
_col0   _col1   _col2   _col3   _col4   _col5   _col6   _col7
emp.empno   emp.ename   emp.job emp.mgr emp.hiredate    emp.sal emp.comm    emp.deptno
1234    JACK    CLERK   7856    1998-2-23   1300.0  NULL    50
7369    JACK    SALESMAN    7698    1981-2-20   1600.0  300.0   30
7724    WILLAM  SALESMAN    7750    1991-3-11   1750.0  300.0   30
7564    NIKE    SALESMAN    7432    1999-12-20  1500.0  600.0   30
7897    JACKSON SALESMAN    7750    1995-4-24   13500.0 500.0   30
7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20
7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9    2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  NULL    20
7839    KING    PRESIDENT   NULL    1981-11-17  5000.0  NULL    10
7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0 30
7876    ADAMS   CLERK   7788    1987-5-23   1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3   950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3   3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23   1300.0  NULL    10

（2）在Hive shell 中用户可以使用SOURCE命令来执行一个脚本文件。

[root@hadoop01 ~]# cat test.hql
SELECT * FROM emp LIMIT 5;
[root@hadoop01 ~]# hiveLogging initialized using configuration in file:/opt/modules/apache/hive-1.2.1/conf/hive-log4j.properties
hive (default)> SOURCE /root/test.hql> ;
OK
emp.empno   emp.ename   emp.job emp.mgr emp.hiredate    emp.sal emp.comm    emp.deptno
1234    JACK    CLERK   7856    1998-2-23   1300.0  NULL    50
7369    JACK    SALESMAN    7698    1981-2-20   1600.0  300.0   30
7724    WILLAM  SALESMAN    7750    1991-3-11   1750.0  300.0   30
7564    NIKE    SALESMAN    7432    1999-12-20  1500.0  600.0   30
7897    JACKSON SALESMAN    7750    1995-4-24   13500.0 500.0   30
Time taken: 1.082 seconds, Fetched: 5 row(s)

3.hiverc文件

-i 文件名 选项允许用户指定一个文件，当CLI启动时，在提示符出现前会先执行这个文件。(待再次考证)hive会自动在HOME（当前用户家目录）目录下寻找名为.hiverc的文件，而且会自动执行这个文件中的命令。

对于用户需要频繁执行的命令，使用这个文件时非常方便的。例如：设置系统属性，或者增加对于Hadoop的分布式内存进行自定义的hive扩展的Java包（JAR文件）。

$HIVE_HOME/.hiverc

ADD JAR /path/to/custom_hive_extensions.jar;
set hive.cli.print.current.db=true;
set hive.exec.mode.local.auto=true;

4.Hive CLI的其他功能

（1）自动补全功能

如果在输入的过程中敲击tab键，那么CLI会自动补全而可能的关键字或者函数名。

（2）查看操作历史命令

Hive会将最近的10000行命令记录到文件$ HIVE_HOME/.hivehistory中。

Tips：大多数导航按键使用的Control+字母和bash shell中是相同的：

Control+A：移动光标到行首
Control+B：移动光标到行尾

5.执行shell命令

用户不需要退出hive CLI就可以执行简单的bash shell命令。只需要在命令前加上!并且以分号（;）结尾就可以。

hive (default)> !ls;
temp.txt
test.hql
hive (default)> !pwd;
/root

Hive CLI中不能使用需要用户进行交互式命令，而且不支持shell的管道功能和文件的自动补全功能。

6.在Hive内使用Hadoop的dfs命令

用户可以在Hive CLI中执行Hadoop的dfs…命令，只需要将hadoop命令中关键字hdfs去掉，然后以分号结尾就可以了。

hive (default)> dfs -ls /user/hive/warehouse;
Found 11 items
drwxrwxr-x   - root supergroup          0 2019-06-14 02:49 /user/hive/warehouse/db_emp.db
drwxrwxr-x   - root supergroup          0 2019-06-16 01:09 /user/hive/warehouse/dept
drwxrwxr-x   - root supergroup          0 2019-06-14 02:39 /user/hive/warehouse/dept_exter
drwxrwxr-x   - root supergroup          0 2019-06-16 12:59 /user/hive/warehouse/emp
drwxrwxr-x   - root supergroup          0 2019-06-11 04:03 /user/hive/warehouse/emp_bu
drwxrwxr-x   - root supergroup          0 2019-06-14 03:38 /user/hive/warehouse/emp_bu01
drwxrwxr-x   - root supergroup          0 2019-06-14 03:39 /user/hive/warehouse/emp_buck
drwxrwxr-x   - root supergroup          0 2019-06-09 14:40 /user/hive/warehouse/ha01
drwxrwxr-x   - root supergroup          0 2019-06-09 16:12 /user/hive/warehouse/ha02
drwxrwxr-x   - root supergroup          0 2019-06-10 23:32 /user/hive/warehouse/student
drwxrwxr-x   - root supergroup          0 2019-06-10 23:42 /user/hive/warehouse/student01

[root@hadoop01 hive-1.2.1]# hdfs dfs -ls /user/hive/warehouse
Found 11 items
drwxrwxr-x   - root supergroup          0 2019-06-14 02:49 /user/hive/warehouse/db_emp.db
drwxrwxr-x   - root supergroup          0 2019-06-16 01:09 /user/hive/warehouse/dept
drwxrwxr-x   - root supergroup          0 2019-06-14 02:39 /user/hive/warehouse/dept_exter
drwxrwxr-x   - root supergroup          0 2019-06-16 12:59 /user/hive/warehouse/emp
drwxrwxr-x   - root supergroup          0 2019-06-11 04:03 /user/hive/warehouse/emp_bu
drwxrwxr-x   - root supergroup          0 2019-06-14 03:38 /user/hive/warehouse/emp_bu01
drwxrwxr-x   - root supergroup          0 2019-06-14 03:39 /user/hive/warehouse/emp_buck
drwxrwxr-x   - root supergroup          0 2019-06-09 14:40 /user/hive/warehouse/ha01
drwxrwxr-x   - root supergroup          0 2019-06-09 16:12 /user/hive/warehouse/ha02
drwxrwxr-x   - root supergroup          0 2019-06-10 23:32 /user/hive/warehouse/student
drwxrwxr-x   - root supergroup          0 2019-06-10 23:42 /user/hive/warehouse/student01

这种使用hadoop命令的方式实际上比与等价的在bash shell中hdfs dfs…命令更为高效。因为后者每次都会启动一个新的JVM实例，而hive会在同一个进程中执行这些命令。

7.Hive脚本中如何进行注释

用户可以使用以–开头的字符串来表示注释

--hql query
--This is the best Hive script evar!!
SELECT * FROM emp LIMIT 5

8.显示字段名称

通过设置hiveconf配置项hive.cli.print.header为true来开启这个功能

[root@hadoop01 ~]# cat .hiverc
set hive.cli.print.current.db=true;
set hive.cli.print.header=true;

数据类型和文件格式

1.基本数据类型

数据类型	长度	栗子
TINYINT	1byte有符号数	20
SMALINT	2byte有符号数	20
INT	4byte有符号数	20
BIGINT	8byte有符号数	20
BOOLEAN	布尔类型，true或者false	TRUE
FLOAT	单精度浮点数	3.14159
DOUBLE	双精度浮点数	3.14159
STRING	字符序列。可以指定字符集。可以使用单引号或者双引号	‘alex’,“johnson”
TIMESTAMP	整数，浮点数或者字符串	13854848（Unix新纪元秒），1584125.415645（Unix新纪元秒，带有纳秒数），‘2019-07-16 12:34:24.123456789’(JDBC所兼容的java.sql.Timestamp时间格式)
BINARY	字节数组

2.集合数据类型

数据类型	描述	示例
STRUST	类C语言的struct或者“对象”，都可以通过点“.”符号访问元素内容，STRUCT{first STRING,last String}，那么第一个元素可以通过字段名.first来引用。	struct（‘alex’,‘mike’）
MAP	MAP是一组键值对元素集合，使用数组表示法（例如[‘key’]）可以访问元素。	map(‘first’,‘john’,‘last’,‘white’)
ARRAY	数组是一组具有相同数据类型和名称的变量的集合，数组下标从0开始	Array(‘John’,‘Joe’)

CREATE TABLE employees(name STRING,salary DOUBLE,subordinates ARRAY<STRING>,deductions MAP<STRING,FLOAT>,address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>);
)

hive (default)> show create table employees;
OK
createtab_stmt
CREATE TABLE `employees`(`name` string, `salary` double, `subordinates` array<string>, `deductions` map<string,float>, `address` struct<street:string,city:string,state:string,zip:int>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION'hdfs://hadoop01:8020/user/hive/warehouse/employees'
TBLPROPERTIES ('transient_lastDdlTime'='1560666188')
Time taken: 0.149 seconds, Fetched: 16 row(s)

3.文本文件数据编码

分隔符	描述
\n	对于文本文件来说，每行都是一条记录，因此换行符可以分割记录
^A(ctrl+v,ctrl+a)	用于分隔字段(列)，在CREATE TABLE语句中可以使用八进制编码\001表示
^B(ctrl+v,ctrl+b)	用于分隔ARRAY或者STRUCT中的元素，或用于MAP中键-值对之间的分隔。在CREATE TABLE语句中可以使用八进制编码\002表示
^C(ctrl+v,ctrl+c)	用于MAP中键和值之间的分隔。在CREATE TABLE语句中可以使用八进制编码\003表示

John Doe^A100000.0^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600
Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601
Todd Jones^A70000.0^A^AFederal Taxes^C.15^BStateTaxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700
Bill King^A60000.0^A^AFederal Taxes^C.15^BStateTaxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100

CREATE TABLE employees(name STRING,salary DOUBLE,subordinates ARRAY<STRING>,deductions MAP<STRING,FLOAT>,address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

hive (default)> desc formatted employees;
OK
col_name    data_type   comment
# col_name              data_type               comment             name                    string
salary                  double
subordinates            array<string>
deductions              map<string,float>
address                 struct<street:string,city:string,state:string,zip:int>                        # Detailed Table Information
Database:               default
Owner:                  root
CreateTime:             Sun Jun 16 14:58:07 CST 2019
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://hadoop01:8020/user/hive/warehouse/employees
Table Type:             MANAGED_TABLE
Table Parameters:        transient_lastDdlTime  1560668287          # Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:         colelction.delim       \u0002              field.delim             \u0001              line.delim              \n                  mapkey.delim            \u0003              serialization.format    \u0001
Time taken: 0.066 seconds, Fetched: 34 row(s)

hive (default)> load data local inpath '/root/emp01.txt' into table employees;
Loading data to table default.employees
Table default.employees stats: [numFiles=1, numRows=0, totalSize=425, rawDataSize=0]
OK
Time taken: 0.23 seconds
hive (default)> select * from employees;
OK
employees.name  employees.salary    employees.subordinates  employees.deductions    employees.address
John Doe    100000.0    ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}
Mary Smith  80000.0 ["Bill King"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}
Todd Jones  70000.0 []  {"Federal Taxes":0.15,"StateTaxes":0.03,"Insurance":0.1}  {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}
Bill King   60000.0 []  {"Federal Taxes":0.15,"StateTaxes":0.03,"Insurance":0.1}  {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}
Time taken: 0.036 seconds, Fetched: 4 row(s)

4.读时模式

当用户向传统数据库中写入数据的时候，不管是采用装载外部数据库的方式，还是采用将一个查询的输出结果写入的方式，或者是采用UPDATE语句，等等，数据库对于存储具有完全的控制力。数据库就是守门人。传统数据库是写时模式（schema on write），即数据在写入数据库时对模式进行检查。

Hive对底层存储并没有这样的控制。Hive不会再数据加载时进行验证，而是在查询时进行，也就是读时模式（schema on read）。

HiveQL：数据定义

1.HQL简介

HiveQL是Hive查询语言。和普遍使用的所有SQL方言一样，它不完全遵守任一种ANSI SQL标准的修订版。HQL不支持行级插入操作，更新操作和删除操作。Hive也不支持事务。

2.Hive中的数据库

Hive中的数据库概念上仅仅是表的一个目录或命名空间。可以避免表命名冲突，通常会使用数据库来将产生表组织成逻辑组。

（1）如果用户没有显示指定数据库，那么将会使用默认数据库default；

--创建数据库test01。
hive (default)> create database test01;
OK
Time taken: 0.09 seconds
--创建数据库test01，使用IF NOT EXISTS语句可以避免数据已存在抛出错误信息。
hive (default)> CREATE DATABASE IF NOT EXISTS test01;
OK
Time taken: 0.036 seconds
--查看所有的数据库。
hive (default)> show databases;
OK
database_name
db_emp
default
test01
Time taken: 0.011 seconds, Fetched: 3 row(s)

（2）在所有的数据库相关命令中，都可以使用SCHEMA关键字来代替DATABASE；

hive (default)> CREATE SCHEMA test02;
OK
Time taken: 0.018 seconds
hive (default)> SHOW DATABASES;
OK
database_name
db_emp
default
test01
test02
Time taken: 0.01 seconds, Fetched: 4 row(s)
hive (default)> SHOW SCHEMAS;
OK
database_name
db_emp
default
test01
test02
Time taken: 0.009 seconds, Fetched: 4 row(s)

（3）可以使用正则表达式匹配来筛选出需要的数据库名；

--正则匹配：LIKE & RLIKE
hive (default)> SHOW DATABASES LIKE 't.*';
OK
database_name
test01
test02
Time taken: 0.013 seconds, Fetched: 2 row(s)

Hive会为每一个数据库创建一个目录。数据库中的表将会以这个数据库目录的子目录形式存储，default数据库目录是属性hive.metastore.warehouse.dir所指定的目录，即/user/hive/warehouse。其他数据库目录普遍位于目录之下，数据库文件目录名是以.db 结尾。

（4）用户可以通过Location关键字指定数据库的存放目录

hive (default)> dfs -mkdir -p  /hive/db;
hive (default)> CREATE DATABASE test03 LOCATION '/hive/test03.db';
OK
Time taken: 0.021 seconds

（5）用户可以使用COMMENT关键字为数据增加描述信息，通过DESCRIBE DTABASE db_name命令就可查看该描述信息。

hive (default)> CREATE DATABASE test04 COMMENT 'Holds All Test Tables';
OK
Time taken: 0.082 seconds
hive (default)> DESCRIBE DATABASE test04;
OK
db_name comment location    owner_name  owner_type  parameters
test04  Holds All Test Tables   hdfs://hadoop01:8020/user/hive/warehouse/test04.db  root    USER
Time taken: 0.017 seconds, Fetched: 1 row(s)

（6）删除数据库

hive (default)> DROP DATABASE IF EXISTS test04;
OK
Time taken: 0.026 seconds
--IF EXISTS子句是可选的，可避免因数据库不存在而报错。

默认情况下，Hive是不允许用户删除一个包含有表的数据库的，用户要么先删除数据库中的表，然后再删除数据库，要么再删除命令最后面加上关键字CASCADE，这样可以使Hive自行先删除数据库中的表；

Time taken: 0.012 seconds
hive (db_emp)> use db_emp;
OK
Time taken: 0.008 seconds
hive (db_emp)> show tables;
OK
tab_name
emp
Time taken: 0.012 seconds, Fetched: 1 row(s)
hive (db_emp)> drop database if exists db_emp;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_emp is not empty. One or more tables exist.)
hive (db_emp)> drop database if exists db_emp cascade;
OK
Time taken: 0.108 seconds

如果某个数据库被删除了，那么其对应的目录也同时会被删除。

3.修改数据库

可以使用ALTER DATABASE 命令修改某个数据库的DBPROPERTIES设置键值对属性值，来描述这个数据库的属性信息。数据库的其他元数据信息都是不可更改的。

hive (db_emp)> ALTER DATABASE test01 set DBPROPERTIES('edited-by'='Nike Hu','Date'='2019-06-16');
OK
Time taken: 0.067 seconds
hive (db_emp)> DESCRIBE SCHEMA EXTENDED test01;
OK
db_name comment location    owner_name  owner_type  parameters
test01      hdfs://hadoop01:8020/user/hive/warehouse/test01.db  root    USER    {edited-by=Nike Hu, Date=2019-06-16}
Time taken: 0.022 seconds, Fetched: 1 row(s)

4.创建表

CREATE TABLE test01.employees(name STRING COMMENT 'employee name',salary DOUBLE COMMENT 'employee salary',subordinates ARRAY<STRING> COMMENT 'Names of subordinates',deductions MAP<STRING,FLOAT> COMMENT 'Keys are deductions names,Vals of percentages',address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT> COMMENT 'Home address')
COMMENT 'Description of the table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/hive/db/employees'
TBLPROPERTIES ('creator'='me','create_at'='2019-16-16');
--1.如果用户当前所处的数据库并非使目标数据库，那么用户是可以再表名前增加一个数据库名（test01.employees）来进行指定。
--2.用户可以在字段类型后为每一个字段增加一个注释。
--3.用户可以为表本身添加一个注释：COMMENT 'Description of the table'
--4.TBLPROPERTIES的主要作用是按键值对的格式为表添加额外的文档说明，可以使用SHOW TBLPROPERTIES t_name查看。
--5.可以使用LOCATION关键字为表中的数据指定一个存储路径

复制表结构创建表：

CREATE TABLE IF NOT EXISTS test01.employees2 LIKE test01.employees；
--可指定LOCATION选项，其他属性，模式直接从原始表获得，不可指定。

使用 IN 关键字列举指定数据库下的表：

hive (test01)> SHOW TABLES IN test02;
OK
tab_name
employees
Time taken: 0.021 seconds, Fetched: 1 row(s)

使用正则表达式过滤出我们需要的表名：

hive (test01)> SHOW TABLES IN default 'emp.*';
OK
tab_name
emp_bu
emp_bu01
emp_buck
employees
Time taken: 0.018 seconds, Fetched: 4 row(s)

使用 EXTENDED 关键字查看表的详细信息：

hive (test01)> DESCRIBE EXTENDED employees;
OK
col_name    data_type   comment
name                    string                  employee name
salary                  double                  employee salary
subordinates            array<string>         Names of subordinates
deductions              map<string,float>     Keys are deductions names,Vals of percentages
address                 struct<street:string,city:string,state:string,zip:int>    Home address        Detailed Table Information  Table(tableName:employees, dbName:test01, owner:root, createTime:1560674946, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:employee name), FieldSchema(name:salary, type:double, comment:employee salary), FieldSchema(name:subordinates, type:array<string>, comment:Names of subordinates), FieldSchema(name:deductions, type:map<string,float>, comment:Keys are deductions names,Vals of percentages), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:Home address)], location:hdfs://hadoop01:8020/hive/db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{line.delim=
, field.delim=, colelction.delim=, mapkey.delim=, serialization.format=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{creator=me, totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE=true, numFiles=0, transient_lastDdlTime=1560675388, comment=Description of the table, create_at=2019-16-16}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.064 seconds, Fetched: 8 row(s)
--DESCRIBE 可简写为 DESC
hive (test01)> DESC EXTENDED employees;
OK
col_name    data_type   comment
name                    string                  employee name
salary                  double                  employee salary
subordinates            array<string>         Names of subordinates
deductions              map<string,float>     Keys are deductions names,Vals of percentages
address                 struct<street:string,city:string,state:string,zip:int>    Home address        Detailed Table Information  Table(tableName:employees, dbName:test01, owner:root, createTime:1560674946, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:employee name), FieldSchema(name:salary, type:double, comment:employee salary), FieldSchema(name:subordinates, type:array<string>, comment:Names of subordinates), FieldSchema(name:deductions, type:map<string,float>, comment:Keys are deductions names,Vals of percentages), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:Home address)], location:hdfs://hadoop01:8020/hive/db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{line.delim=
, field.delim=, colelction.delim=, mapkey.delim=, serialization.format=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{creator=me, totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE=true, numFiles=0, transient_lastDdlTime=1560675388, comment=Description of the table, create_at=2019-16-16}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.074 seconds, Fetched: 8 row(s)
--使用FORMATTED关键字代替EXTENDED关键字，可以提供更加可读的和冗长的输出信息

查看某一个列的信息，只需要在表名增加这个字段名即可

hive (test01)> DESC employees.name;
OK
col_name    data_type   comment
name                    string                  from deserializer
Time taken: 0.041 seconds, Fetched: 1 row(s)

5.管理表

管理表（内部表）：Hive可通过这种表控制数据的生命周期，当我们删除一个管理表时，Hive会删除这个管理表的元数据信息，同时也会删除这个表中的数据。

管理表不方便与其他工作共享数据。

6.外部表

使用EXTERNAL关键字创建外部表，通过LOCATION关键字指定Hive数据的目录，

CREATE EXTERNAL TABLE IF NOT EXISTS stocks(exch STRING,symbol STRING,ymd STRING,price_open FLOAT,price_high FLOAT,price_low FLOAT,price_close FLOAT,volume INT,price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks';

使用外部表，Hive认为其并非完全拥有这份数据，因此，删除表并不会删除掉这份数据，只会删除该外部表的元数据信息。

如果数据会被多个工具共享，那么可以创建一张外部表，来明确对数据的所有权。

对已有的表进行表结构的复制创建外部表

hive (test01)> CREATE EXTERNAL TABLE IF NOT EXISTS test03.employees > LIKE test01.employees> LOCATION '/hive/db/employees';
OK
Time taken: 0.056 seconds
--如果源表是外部表，省略EXTERNAL 关键字所创建的表也是外部表。

7.分区表，管理表

通常使用分区来水平分散压力，将数据从物理上转移到和使用最频繁的用户更近的地方，以实现其他目的。分区表具有重要的性能优势，而且分区表还可以将数据以一种符合逻辑的方式进行组织，比如分层存储。

CREATE TABLE test04.employees(name STRING COMMENT 'employee name',salary DOUBLE COMMENT 'employee salary',subordinates ARRAY<STRING> COMMENT 'Names of subordinates',deductions MAP<STRING,FLOAT> COMMENT 'Keys are deductions names,Vals of percentages',address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT> COMMENT 'Home address')PARTITIONED BY (country STRING,state STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n';

hive (test04)> select * from employees;
OK
employees.name  employees.salary    employees.subordinates  employees.deductions    employees.address   employees.country   employees.state
Time taken: 0.052 seconds

创建分区

ALTER TABLE employees ADD PARTITION(country='USA',state='Illinois');
ALTER TABLE employees ADD PARTITION(country='China',state='Xian');

分区表的目录结构如下所示

[root@hadoop01 ~]# hdfs dfs -ls -R /user/hive/warehouse/test04.db/employees
drwxrwxr-x   - root supergroup          0 2019-06-16 20:28 /user/hive/warehouse/test04.db/employees/country=China
drwxrwxr-x   - root supergroup          0 2019-06-16 20:28 /user/hive/warehouse/test04.db/employees/country=China/state=Xian
drwxrwxr-x   - root supergroup          0 2019-06-16 20:27 /user/hive/warehouse/test04.db/employees/country=USA
drwxrwxr-x   - root supergroup          0 2019-06-16 20:27 /user/hive/warehouse/test04.db/employees/country=USA/state=Illinois

查看分区表信息

hive (test04)> SHOW PARTITIONS employees;
OK
partition
country=China/state=Xian
country=USA/state=Illinois
Time taken: 0.236 seconds, Fetched: 2 row(s)

向分区表中插入数据

LOAD DATA LOCAL INPATH '/root/emp02.text' INTO TABLE employees PARTITION(country='USA',state='Illinois');
LOAD DATA LOCAL INPATH '/root/emp03.text' INTO TABLE employees PARTITION(country='China',state='Xian');

查看分区表的数据

hive (test04)> SELECT * FROM employees;
OK
employees.name  employees.salary    employees.subordinates  employees.deductions    employees.address   employees.country   employees.state
Nike Hu 100000.0    ["Bob White","Jack Black"]  {"Federal Taxes":0.1,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 GaoXin St.","city":"Xian","state":"SX","zip":60609}    China   Xian
Bob White   90000.0 []  {"Federal Taxes":0.1,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 GaoXin St.","city":"Xian","state":"SX","zip":60611}    China   Xian
Jack Black  90000.0 []  {"Federal Taxes":0.1,"StateTaxes":0.05,"Insurance":0.1}   {"street":"101 GaoXin St.","city":"Xian","state":"SX","zip":60612}    China   Xian
John Doe    100000.0    ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}    USA Illinois
Mary Smith  80000.0 ["Bill King"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}    USA Illinois
Todd Jones  70000.0 []  {"Federal Taxes":0.15,"StateTaxes":0.03,"Insurance":0.1}  {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}  USA Illinois
Bill King   60000.0 []  {"Federal Taxes":0.15,"StateTaxes":0.03,"Insurance":0.1}  {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}   USA Illinois
Time taken: 0.105 seconds, Fetched: 7 row(s)
hive (test04)> SELECT * FROM employees WHERE country='China' AND state='Xian';
OK
employees.name  employees.salary    employees.subordinates  employees.deductions    employees.address   employees.country   employees.state
Nike Hu 100000.0    ["Bob White","Jack Black"]  {"Federal Taxes":0.1,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 GaoXin St.","city":"Xian","state":"SX","zip":60609}    China   Xian
Bob White   90000.0 []  {"Federal Taxes":0.1,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 GaoXin St.","city":"Xian","state":"SX","zip":60611}    China   Xian
Jack Black  90000.0 []  {"Federal Taxes":0.1,"StateTaxes":0.05,"Insurance":0.1}   {"street":"101 GaoXin St.","city":"Xian","state":"SX","zip":60612}    China   Xian
Time taken: 0.702 seconds, Fetched: 3 row(s)
hive (test04)> SELECT * FROM employees WHERE country='USA' AND state='Illinois';
OK
employees.name  employees.salary    employees.subordinates  employees.deductions    employees.address   employees.country   employees.state
John Doe    100000.0    ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}    USA Illinois
Mary Smith  80000.0 ["Bill King"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}    USA Illinois
Todd Jones  70000.0 []  {"Federal Taxes":0.15,"StateTaxes":0.03,"Insurance":0.1}  {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}  USA Illinois
Bill King   60000.0 []  {"Federal Taxes":0.15,"StateTaxes":0.03,"Insurance":0.1}  {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}   USA Illinois
Time taken: 0.065 seconds, Fetched: 4 row(s)

分区字段一旦创建好，表现得和普通字段一样。

事实上，除非需要优化查询性能，否则使用这些表的用户不需要关心这些”字段“是否是分区字段。

查看某个特定分区键的分区，可以增加一个多或多个特定分区字段值的PARTITION子句，进行过滤查询

hive (test04)> SHOW PARTITIONS employees;
OK
partition
country=China/state=JiangSu
country=China/state=JiangXi
country=China/state=Xian
country=USA/state=Illinois
Time taken: 0.054 seconds, Fetched: 4 row(s)
hive (test04)> SHOW PARTITIONS employees PARTITION(country='China');
OK
partition
country=China/state=Xian
country=China/state=JiangXi
country=China/state=JiangSu
Time taken: 0.06 seconds, Fetched: 3 row(s)

hive (test04)> DESC FORMATTED employees;
OK
col_name    data_type   comment
# col_name              data_type               comment             name                    string                  employee name
salary                  double                  employee salary
subordinates            array<string>         Names of subordinates
deductions              map<string,float>     Keys are deductions names,Vals of percentages
address                 struct<street:string,city:string,state:string,zip:int>    Home address        # Partition Information
# col_name              data_type               comment             country                 string
state                   string
...

8.外部分区表

外部表同样也可以使用分区表。

CREATE EXTERNAL TABLE IF NOT EXISTS log_message(hms INT,severity STRING,server STRING,process_id INT,message STRING)
PARTITIONED BY(year INT,month INT,day INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

hive (test04)> SELECT * FROM log_message;
OK
log_message.hms log_message.severity    log_message.server  log_message.process_id  log_message.message log_message.year    log_message.month   log_message.day
Time taken: 0.046 seconds

外部分区表的数据所在的目录结构

[root@hadoop01 ~]# hdfs dfs -ls -R /hive/db/logs
drwxr-xr-x   - root supergroup          0 2019-06-16 22:03 /hive/db/logs/2018
drwxr-xr-x   - root supergroup          0 2019-06-16 22:03 /hive/db/logs/2018/09
drwxr-xr-x   - root supergroup          0 2019-06-16 22:09 /hive/db/logs/2018/09/05
-rw-r--r--   1 root supergroup        390 2019-06-16 22:09 /hive/db/logs/2018/09/05/2018_09_05
drwxr-xr-x   - root supergroup          0 2019-06-16 22:08 /hive/db/logs/2018/09/16
-rw-r--r--   1 root supergroup        390 2019-06-16 22:08 /hive/db/logs/2018/09/16/2018_09_16
drwxr-xr-x   - root supergroup          0 2019-06-16 22:04 /hive/db/logs/2019
drwxr-xr-x   - root supergroup          0 2019-06-16 22:04 /hive/db/logs/2019/04
drwxr-xr-x   - root supergroup          0 2019-06-16 22:10 /hive/db/logs/2019/04/28
-rw-r--r--   1 root supergroup        398 2019-06-16 22:10 /hive/db/logs/2019/04/28/2019_04_28
drwxr-xr-x   - root supergroup          0 2019-06-16 22:04 /hive/db/logs/2019/05
drwxr-xr-x   - root supergroup          0 2019-06-16 22:10 /hive/db/logs/2019/05/10
-rw-r--r--   1 root supergroup        390 2019-06-16 22:10 /hive/db/logs/2019/05/10/2019_05_10
drwxr-xr-x   - root supergroup          0 2019-06-16 22:10 /hive/db/logs/2019/05/15
-rw-r--r--   1 root supergroup        390 2019-06-16 22:10 /hive/db/logs/2019/05/15/2019_05_15
drwxr-xr-x   - root supergroup          0 2019-06-16 22:10 /hive/db/logs/2019/05/27
-rw-r--r--   1 root supergroup        390 2019-06-16 22:10 /hive/db/logs/2019/05/27/2019_05_27
drwxr-xr-x   - root supergroup          0 2019-06-16 22:04 /hive/db/logs/2019/06
drwxr-xr-x   - root supergroup          0 2019-06-16 22:11 /hive/db/logs/2019/06/22
-rw-r--r--   1 root supergroup        390 2019-06-16 22:11 /hive/db/logs/2019/06/22/2019_06_22

外部分区表插入数据

hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2018,month=09,day=05) LOCATION '/hive/db/logs/2018/09/05';
OK
Time taken: 0.084 seconds
hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2018,month=09,day=16) LOCATION '/hive/db/logs/2018/09/16';
OK
Time taken: 0.096 seconds
hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2019,month=04,day=28) LOCATION '/hive/db/logs/2019/04/28';
OK
Time taken: 0.05 seconds
hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2019,month=05,day=10) LOCATION '/hive/db/logs/2019/05/10';
OK
Time taken: 0.058 seconds
hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2019,month=05,day=15) LOCATION '/hive/db/logs/2019/05/15';
OK
Time taken: 0.043 seconds
hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2019,month=05,day=27) LOCATION '/hive/db/logs/2019/05/27';
OK
Time taken: 0.051 seconds
hive (test04)> ALTER TABLE log_message ADD PARTITION(year=2019,month=06,day=22) LOCATION '/hive/db/logs/2019/06/22';
OK
Time taken: 0.065 seconds

查看外部分区表信息

hive (test04)> SHOW PARTITIONS log_message;
OK
partition
year=2018/month=9/day=16
year=2018/month=9/day=5
year=2019/month=4/day=28
year=2019/month=5/day=10
year=2019/month=5/day=15
year=2019/month=5/day=27
year=2019/month=6/day=22
Time taken: 0.169 seconds, Fetched: 7 row(s)

查看分区表数据

hive (test04)> SELECT * FROM log_message WHERE year=2018;
OK
log_message.hms log_message.severity    log_message.server  log_message.process_id  log_message.message log_message.year    log_message.month   log_message.day
48456   WARNING 192.168.121.40  4568    SQL WARNING 2018    9   16
59638   Error   192.168.159.121 5968    SQL Error   2018    9   16
59998   WARNING 192.168.121.150 6325    SQL WARNING 2018    9   16
63954   Info    192.168.136.45  25369   SQL Info    2018    9   16
66985   WARNING 192.168.121.40  4568    SQL WARNING 2018    9   16
70269   error   192.168.159.121 25698   SQL Error   2018    9   16
85123   WARNING 192.168.121.40  4568    SQL WARNING 2018    9   16
48456   WARNING 192.168.121.40  4568    SQL WARNING 2018    9   5
59638   Error   192.168.159.121 5968    SQL Error   2018    9   5
59998   WARNING 192.168.121.150 6325    SQL WARNING 2018    9   5
63954   Info    192.168.136.45  25369   SQL Info    2018    9   5
66985   WARNING 192.168.121.40  4568    SQL WARNING 2018    9   5
70269   error   192.168.159.121 25698   SQL Error   2018    9   5
85123   WARNING 192.168.121.40  4568    SQL WARNING 2018    9   5
Time taken: 0.108 seconds, Fetched: 14 row(s)
hive (test04)> SELECT * FROM log_message WHERE year=2019 AND month=06;
OK
log_message.hms log_message.severity    log_message.server  log_message.process_id  log_message.message log_message.year    log_message.month   log_message.day
48456   WARNING 192.228.121.40  4568    SQL WARNING 2019    6   22
59638   Error   192.228.229.121 5968    SQL Error   2019    6   22
59998   WARNING 192.228.121.220 6325    SQL WARNING 2019    6   22
63954   Info    192.228.136.45  25369   SQL Info    2019    6   22
66985   WARNING 192.228.121.40  4568    SQL WARNING 2019    6   22
70269   error   192.228.229.121 25698   SQL Error   2019    6   22
85123   WARNING 192.228.121.40  4568    SQL WARNING 2019    6   22
Time taken: 0.056 seconds, Fetched: 7 row(s)

9.删除表

DROP TABLE IF EXISTS ta_name;

0: jdbc:hive2://192.168.142.92:10000/default> DROP TABLE IF EXISTS person;
No rows affected (1.099 seconds)
0: jdbc:hive2://192.168.142.92:10000/default> SHOW TABLES;
+-----------+
| tab_name  |
+-----------+
| students  |
+-----------+
1 row selected (0.086 seconds)

对于管理表，表的元数据信息和表内的数据都会被删除。

对与外部表，表的元数据信息会被删除，但是表中的数据不会被删除。

10.修改表

大多数表属性可以通过ALTER TABLE语句来进行修改。这种操作会修改元数据，但不会修改数据本身。

（1）表重命名

ALTER TABLE　old_tname RENAME TO new_tname;

0: jdbc:hive2://192.168.142.92:10000/default> show tables;
+-----------+
| tab_name  |
+-----------+
| students  |
+-----------+
1 row selected (0.032 seconds)
0: jdbc:hive2://192.168.142.92:10000/default> ALTER TABLE students RENAME TO stus;
No rows affected (0.447 seconds)
0: jdbc:hive2://192.168.142.92:10000/default> show tables;
+-----------+
| tab_name  |
+-----------+
| stus      |
+-----------+
1 row selected (0.055 seconds)

（2）增加，修改和删除表分区

增加一个新的分区

ALTER TABLE tname ADD IF NOT EXISTS
PARTITION (field1=a1,field2=b1,field3=c1) LOCATION '/path/to/hdfs/xxx/a1/b1/c1'
PARTITION (field1=a2,field2=b2,field3=c2) LOCATION '/path/to/hdfs/xxx/a2/b2/c2'
PARTITION (field1=a3,field2=b3,field3=c3) LOCATION '/path/to/hdfs/xxx/a3/b3/3'
...;

修改某个分区的数据源

ALTER TABLE tname PARTITION (field1=a1,field2=b1,field3=c1) SET LOCATION '/path/to/hdfs/xxx/a1/b1/c1';

删除某个分区

ALTER TABLE tname DROP IF EXISTS PARTITION(field1=a1,field2=b1,field3=c1);

（3）修改列信息

用户可以对某个字段进行重命名，并修改其位置，类型或者注释：

ALTER TABLE tname
CHANGE COLUMN old_fname new_fname type
COMMENT '注释信息'
AFTER|BEFORE other_fields;

注意：关键字COLUMN和COMMENT子句都是可选的。

（4）增加列

ALTER　TABLE tname ADD COLUMNS (
field1 type1 COMMENT 'info',
field2 type2 COMMENT 'info',
...);

（5）删除列或者替换列

ALTER TABLE tname REPLACE COLUMNS(
field1 type1 COMMENT 'info',
field2 type2 COMMENT 'info',
...);

移除表中所有的列，并重新指定了新的字段。

（6）修改表属性

用户可以增加附加的表属性或者修改已存在的属性，但是无法删除属性：

ALTER TABLE tname SET TBLPROPERTIES(
'properties1' = '.........................',
...);

（7）修改存储属性

ALTER TABLE tname
PARTITION(fields1=a1,fields2=a2,fields=a3)
SET FILEFORMAT SEQUENCEFILE;