
In this lesson, we will see how we can get started with Apache Hive by installing it on our Ubuntu machine and verifying the installation by running some Hive DDL Commands as well. Installing and running Apache Hive can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.

在本课程中,我们将看到如何通过在Ubuntu计算机上安装Apache Hive并通过运行一些Hive DDL命令来验证安装来开始使用Apache Hive。 安装和运行Apache Hive可能很棘手,这就是为什么我们将尽量简化本课并提供更多信息的原因。

In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

在此安装指南中,我们将使用Ubuntu 17.10(GNU / Linux 4.13.0-37-generic x86_64)计算机:

Ubuntu Version


配置单元安装的先决条件 (Prerequisites for Hive Installation)

Before we can proceed to Hive Installation on our machine, we need to have some other things installed as well:


  • Java must be installed必须安装Java
  • Hadoop must be installed and cluster must be configured必须安装Hadoop并且必须配置集群

Java设置 (Java Setup)

Before we can start installing Hive, we need to update Ubuntu with the latest software patches available:


sudo apt-get update && sudo apt-get -y dist-upgrade

Next, we need to install Java on the machine as Java is the main Prerequisite to run Hive and Hadoop. Java 6 and above versions are supported for Hive. Let’s install Java 8 for this lesson:

接下来,我们需要在计算机上安装Java,因为Java是运行Hive和Hadoop的主要前提条件。 Hive支持Java 6及更高版本。 让我们为此课程安装Java 8:

sudo apt-get -y install openjdk-8-jdk-headless

Hive安装入门 (Getting Started with Hive Installation)

We are ready to start downloading Hive once you have installed Java and Hadoop based on instructions presented above.


Find all Hive installation files on Apache Hive archives. Now, run the following set of commands to make a new directory and download the latest available Hive installation archive from the mirror site:

在Apache Hive归档文件中找到所有Hive安装文件。 现在,运行以下命令以创建新目录,并从镜像站点下载最新的可用Hive安装档案:

mkdir hive
cd hive

With this, a new file apache-hive-2.3.3-bin.tar.gz will be downloaded on the system:


Downloading Hive


Let us uncompress this file now:


tar -xvf apache-hive-2.3.3-bin.tar.gz

Now, the periods in the file name might not be accepted as valid identifiers on the path variables in Ubuntu. To avoid these issues, rename the unarchived directory:

现在,文件名中的句点可能不会被接受为Ubuntu中路径变量上的有效标识符。 为避免这些问题,请重命名未归档的目录:

mv apache-hive-2.3.3-bin apache_hive

Once this is done, we need to add Hive home directory to path. Run the following set of commands to edit the .bashrc file:

完成此操作后,我们需要将Hive主目录添加到path。 运行以下命令集以编辑.bashrc文件:

vi .bashrc

Add the following lines in the .bashrc file and save it:


export HIVE_HOME=$HOME/hive/apache_hive
export PATH=$PATH:$HIVE_HOME/bin

Now, to make environment variables come into effect, source the .bashrc file:


source .bashrc

Note that path to Hadoop is already set in our file and overall configuration is done:


# Configure Hadoop and Java Home
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64export PATH=$PATH:$HADOOP_HOME/binexport HIVE_HOME=$HOME/hive/apache_hive
export PATH=$PATH:$HIVE_HOME/bin

If you want to confirm that Hadoop is correctly working, just check its version:


Check Hadoop version


Now, we need to configure the directory information where Hive can store data into Hadoop Distributed File System (HDFS). For this, we will make a new directory:

现在,我们需要配置Hive可以将数据存储到Hadoop分布式文件系统(HDFS)中的目录信息。 为此,我们将创建一个新目录:

hdfs dfs -mkdir -p /root/hive/warehouse

Once this is done, we have the last configuration to do before we can launch the Hive shell. We need to inform hive about the database that it should use for its schema definition. We execute the following line so that Hive can initialize the metastore schema:

完成此操作后,我们将有最后要做的配置,然后才能启动Hive shell 。 我们需要通知配置单元有关其架构定义应使用的数据库。 我们执行以下行,以便Hive可以初始化metastore模式:

$HIVE_HOME/bin/schematool -initSchema -dbType derby

When we execute the command, we will see the following success output:


Hive metastore schema initialization

Hive Metastore模式初始化

启动Hive Shell (Starting the Hive Shell)

After all this configuration is done, Hive can be launched with a single and simple command:



If everything worked correctly, you should see the hive shell appearing magically:


Starting Hive shell


使用Hive Shell (Using the Hive Shell)

Now that we have a Hive shell running, we will put it to use with some basic Hive DDL Commands in which we will use Hive Query language (HQL).

现在我们已经运行了Hive shell,我们将其与一些基本的Hive DDL命令一起使用,在这些命令中我们将使用Hive查询语言(HQL)。

HQL:创建数据库 (HQL: Creating a Database)

Like any other Database, we can start using Hive only after we make a Database. Let’s do this now:

像任何其他数据库一样,我们只有在创建数据库后才能开始使用Hive。 让我们现在开始:


We will see the following output:


Create Database in Hive


A better way to create a database is by checking if the DB doesn’t exist already:



We will see the same output here as well:


Create Database in Hive, if not exists


Now we can show databases which exist in Hive:


show databases;

This will result in the following:


Show Databases using HQL


HQL:创建表 (HQL: Creating Tables)

We have an active database present where we can create some tables as well. To do this, first switch to the DB you want to use:

我们有一个活动的数据库,也可以在其中创建一些表。 为此,请首先切换到要使用的数据库:

use journaldev;

Now, create a new table inside this DB with some fields:


create table blogs(blog_id INT, blog_title STRING, blog_link STRING);

Once this table is created, we can show its schema as:


describe blogs;

We will see the following output:


Table metadata


HQL:将数据插入表 (HQL: Inserting Data into Tables)

As final commands, let us insert a record in the table we just created:


INSERT INTO TABLE blogs VALUES (1, 'Introduction to Hive', '');

We will see a long output as Hive, with the help of Hadoop starts MapReduce Jobs to fulfill the data insertion into the warehouse we created. The output will:

在Hive的帮助下,我们将看到很长的输出,在Hadoop的帮助下,MapReduce Jobs开始将数据插入到我们创建的仓库中。 输出将是:

Insert Data into Hive


Finally, we can see the data in Hive as:


select * from blogs;

Show all data in Hive


结论 (Conclusion)

In this lesson, we saw how we can install Apache Hive on an Ubuntu server and start executing sample HQL Queries in it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.

在本课程中,我们了解了如何在Ubuntu服务器上安装Apache Hive并开始在其中执行示例HQL查询。 有关大数据的文章,以更深入地了解可用的大数据工具和处理框架。



