hive的hql怎么运行_在Ubuntu上安装Apache Hive并运行HQL查询

hive的hql怎么运行

In this lesson, we will see how we can get started with Apache Hive by installing it on our Ubuntu machine and verifying the installation by running some Hive DDL Commands as well. Installing and running Apache Hive can be tricky and that’s why we’ll try to keep this lesson as simple and informative as possible.

在本课程中，我们将看到如何通过在Ubuntu计算机上安装Apache Hive并通过运行一些Hive DDL命令来验证安装来开始使用Apache Hive。安装和运行Apache Hive可能很棘手，这就是为什么我们将尽量简化本课并提供更多信息的原因。

In this installation guide, we will make use of Ubuntu 17.10 (GNU/Linux 4.13.0-37-generic x86_64) machine:

在此安装指南中，我们将使用Ubuntu 17.10（GNU / Linux 4.13.0-37-generic x86_64）计算机：

Ubuntu Version

Ubuntu版本

配置单元安装的先决条件 (Prerequisites for Hive Installation)

Before we can proceed to Hive Installation on our machine, we need to have some other things installed as well:

在我们继续在计算机上进行Hive安装之前，我们还需要安装其他一些东西：

Java must be installed必须安装Java
Hadoop must be installed and cluster must be configured必须安装Hadoop并且必须配置集群

Java设置 (Java Setup)

Before we can start installing Hive, we need to update Ubuntu with the latest software patches available:

在开始安装Hive之前，我们需要使用可用的最新软件补丁更新Ubuntu：

sudo apt-get update && sudo apt-get -y dist-upgrade

Next, we need to install Java on the machine as Java is the main Prerequisite to run Hive and Hadoop. Java 6 and above versions are supported for Hive. Let’s install Java 8 for this lesson:

接下来，我们需要在计算机上安装Java，因为Java是运行Hive和Hadoop的主要前提条件。 Hive支持Java 6及更高版本。让我们为此课程安装Java 8：

sudo apt-get -y install openjdk-8-jdk-headless

Hive安装入门 (Getting Started with Hive Installation)

We are ready to start downloading Hive once you have installed Java and Hadoop based on instructions presented above.

根据上述说明，一旦您安装了Java和Hadoop，我们就可以开始下载Hive。

Find all Hive installation files on Apache Hive archives. Now, run the following set of commands to make a new directory and download the latest available Hive installation archive from the mirror site:

在Apache Hive归档文件中找到所有Hive安装文件。现在，运行以下命令以创建新目录，并从镜像站点下载最新的可用Hive安装档案：

mkdir hive
cd hive
wget https://www-eu.apache.org/dist/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz

With this, a new file apache-hive-2.3.3-bin.tar.gz will be downloaded on the system:

这样，新文件apache-hive-2.3.3-bin.tar.gz将下载到系统上：

Downloading Hive

下载Hive

Let us uncompress this file now:

现在让我们解压缩该文件：

tar -xvf apache-hive-2.3.3-bin.tar.gz

Now, the periods in the file name might not be accepted as valid identifiers on the path variables in Ubuntu. To avoid these issues, rename the unarchived directory:

现在，文件名中的句点可能不会被接受为Ubuntu中路径变量上的有效标识符。为避免这些问题，请重命名未归档的目录：

mv apache-hive-2.3.3-bin apache_hive

Once this is done, we need to add Hive home directory to path. Run the following set of commands to edit the .bashrc file:

完成此操作后，我们需要将Hive主目录添加到path。运行以下命令集以编辑.bashrc文件：

cd
vi .bashrc

Add the following lines in the .bashrc file and save it:

在.bashrc文件中添加以下行并保存：

export HIVE_HOME=$HOME/hive/apache_hive
export PATH=$PATH:$HIVE_HOME/bin

Now, to make environment variables come into effect, source the .bashrc file:

现在，要使环境变量生效，请提供.bashrc文件的源代码：

source .bashrc

Note that path to Hadoop is already set in our file and overall configuration is done:

请注意，我们的文件中已经设置了Hadoop的路径，并且已完成整体配置：

# Configure Hadoop and Java Home
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64export PATH=$PATH:$HADOOP_HOME/binexport HIVE_HOME=$HOME/hive/apache_hive
export PATH=$PATH:$HIVE_HOME/bin

If you want to confirm that Hadoop is correctly working, just check its version:

如果您想确认Hadoop是否正常工作，只需检查其版本即可：

Check Hadoop version

检查Hadoop版本

Now, we need to configure the directory information where Hive can store data into Hadoop Distributed File System (HDFS). For this, we will make a new directory:

现在，我们需要配置Hive可以将数据存储到Hadoop分布式文件系统（HDFS）中的目录信息。为此，我们将创建一个新目录：

hdfs dfs -mkdir -p /root/hive/warehouse

Once this is done, we have the last configuration to do before we can launch the Hive shell. We need to inform hive about the database that it should use for its schema definition. We execute the following line so that Hive can initialize the metastore schema:

完成此操作后，我们将有最后要做的配置，然后才能启动Hive shell 。我们需要通知配置单元有关其架构定义应使用的数据库。我们执行以下行，以便Hive可以初始化metastore模式：

$HIVE_HOME/bin/schematool -initSchema -dbType derby

When we execute the command, we will see the following success output:

执行命令时，将看到以下成功输出：

Hive metastore schema initialization

Hive Metastore模式初始化

启动Hive Shell (Starting the Hive Shell)

After all this configuration is done, Hive can be launched with a single and simple command:

完成所有此配置后，可以使用一个简单的命令启动Hive：

hive

If everything worked correctly, you should see the hive shell appearing magically:

如果一切正常，您应该看到配置单元外壳神奇地出现了：

Starting Hive shell

启动Hive外壳

使用Hive Shell (Using the Hive Shell)

Now that we have a Hive shell running, we will put it to use with some basic Hive DDL Commands in which we will use Hive Query language (HQL).

现在我们已经运行了Hive shell，我们将其与一些基本的Hive DDL命令一起使用，在这些命令中我们将使用Hive查询语言（HQL）。

HQL：创建数据库 (HQL: Creating a Database)

Like any other Database, we can start using Hive only after we make a Database. Let’s do this now:

像任何其他数据库一样，我们只有在创建数据库后才能开始使用Hive。让我们现在开始：

CREATE DATABASE journaldev;

We will see the following output:

我们将看到以下输出：

Create Database in Hive

在Hive中创建数据库

A better way to create a database is by checking if the DB doesn’t exist already:

创建数据库的更好方法是检查数据库是否不存在：

CREATE DATABASE IF NOT EXISTS journaldev;

We will see the same output here as well:

我们还将在这里看到相同的输出：

Create Database in Hive, if not exists

在Hive中创建数据库（如果不存在）

Now we can show databases which exist in Hive:

现在我们可以显示Hive中存在的数据库：

show databases;

This will result in the following:

这将导致以下结果：

Show Databases using HQL

使用HQL显示数据库

HQL：创建表 (HQL: Creating Tables)

We have an active database present where we can create some tables as well. To do this, first switch to the DB you want to use:

我们有一个活动的数据库，也可以在其中创建一些表。为此，请首先切换到要使用的数据库：

use journaldev;

Now, create a new table inside this DB with some fields:

现在，在此数据库内创建一个带有一些字段的新表：

create table blogs(blog_id INT, blog_title STRING, blog_link STRING);

Once this table is created, we can show its schema as:

创建此表后，我们可以将其架构显示为：

describe blogs;

We will see the following output:

我们将看到以下输出：

Table metadata

表元数据

HQL：将数据插入表 (HQL: Inserting Data into Tables)

As final commands, let us insert a record in the table we just created:

作为最后的命令，让我们在刚刚创建的表中插入一条记录：

INSERT INTO TABLE blogs VALUES (1, 'Introduction to Hive', 'https://www.journaldev.com/20353/installing-apache-hive-on-ubuntu-and-sample-queries');

We will see a long output as Hive, with the help of Hadoop starts MapReduce Jobs to fulfill the data insertion into the warehouse we created. The output will:

在Hive的帮助下，我们将看到很长的输出，在Hadoop的帮助下，MapReduce Jobs开始将数据插入到我们创建的仓库中。输出将是：

Insert Data into Hive

将数据插入Hive

Finally, we can see the data in Hive as:

最后，我们可以在Hive中看到以下数据：

select * from blogs;

Show all data in Hive

显示Hive中的所有数据

结论 (Conclusion)

In this lesson, we saw how we can install Apache Hive on an Ubuntu server and start executing sample HQL Queries in it. Read more Big Data Posts to gain deeper knowledge of available Big Data tools and processing frameworks.

在本课程中，我们了解了如何在Ubuntu服务器上安装Apache Hive并开始在其中执行示例HQL查询。有关大数据的文章，以更深入地了解可用的大数据工具和处理框架。

翻译自: https://www.journaldev.com/20353/install-apache-hive-ubuntu-hql-queries

hive的hql怎么运行