1.安装hive
如果想创建一个数据库用户,并且为数据库赋值权限,可以参考:http://blog.csdn.net/tototuzuoquan/article/details/52785504

2.将配置好的hive-site.xml、core-site.xml、hdfs-site.xml放入$SPARK_HOME/conf目录下

[root@hadoop1 conf]# cd /home/tuzq/software/hive/apache-hive-1.2.1-bin
[root@hadoop1 conf]# cp hive-site.xml $SPARK_HOME/conf
[root@hadoop1 spark-1.6.2-bin-hadoop2.6]# cd $HADOOP_HOME
[root@hadoop1 hadoop]# cp core-site.xml $SPARK_HOME/conf
[root@hadoop1 hadoop]# cp hdfs-site.xml $SPARK_HOME/conf同步spark集群中的conf中的配置
[root@hadoop1 conf]# scp -r * root@hadoop2:$PWD
[root@hadoop1 conf]# scp -r * root@hadoop3:$PWD
[root@hadoop1 conf]# scp -r * root@hadoop4:$PWD
[root@hadoop1 conf]# scp -r * root@hadoop5:$PWD

放入进去之后,注意重新启动Spark集群,关于集群启动和停止,可以参考:

http://blog.csdn.net/tototuzuoquan/article/details/74481570

修改spark的log4j打印输出的日志错误级别为Error。修改内容为:

3.启动spark-shell时指定mysql连接驱动位置

bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --executor-memory 1g --total-executor-cores 2 --driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar

如果启动的过程中报如下错:

可以按照上面的红框下的url进行检查:
https://wiki.apache.org/hadoop/ConnectionRefused

4.使用sqlContext.sql调用HQL
在使用之前先要启动hive,创建person表:

hive> create table person(id bigint,name string,age int) row format delimited fields terminated by " " ;
OK
Time taken: 2.152 seconds
hive> show tables;
OK
func
person
wyp
Time taken: 0.269 seconds, Fetched: 3 row(s)
hive>

查看hdfs中person的内容:

[root@hadoop3 ~]# hdfs dfs -cat /person.txt
1 zhangsan 19
2 lisi 20
3 wangwu 28
4 zhaoliu 26
5 tianqi 24
6 chengnong 55
7 zhouxingchi 58
8 mayun 50
9 yangliying 30
10 lilianjie 51
11 zhanghuimei 35
12 lian 53
13 zhangyimou 54
[root@hadoop3 ~]# hdfs dfs -cat hdfs://mycluster/person.txt
1 zhangsan 19
2 lisi 20
3 wangwu 28
4 zhaoliu 26
5 tianqi 24
6 chengnong 55
7 zhouxingchi 58
8 mayun 50
9 yangliying 30
10 lilianjie 51
11 zhanghuimei 35
12 lian 53
13 zhangyimou 54

load数据到person表中:

hive> load data inpath '/person.txt' into table person;
Loading data to table default.person
Table default.person stats: [numFiles=1, totalSize=193]
OK
Time taken: 1.634 seconds
hive> select * from person;
OK
1   zhangsan    19
2   lisi    20
3   wangwu  28
4   zhaoliu 26
5   tianqi  24
6   chengnong   55
7   zhouxingchi 58
8   mayun   50
9   yangliying  30
10  lilianjie   51
11  zhanghuimei 35
12  lian    53
13  zhangyimou  54
Time taken: 0.164 seconds, Fetched: 13 row(s)
hive>
如果是spark-2.1.1-bin-hadoop2.7,它没有sqlContext,所以要先执行:val sqlContext = new org.apache.spark.sql.SQLContext(sc)
如果是spark-1.6.2-bin-hadoop2.6,不用执行:val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> sqlContext.sql("select * from person limit 2")
+---+--------+---+
| id|    name|age|
+---+--------+---+
|  1|zhangsan| 19|
|  2|    lisi| 20|
+---+--------+---+scala>

或使用org.apache.spark.sql.hive.HiveContext (同样是在spark-sql这个shell命令下)

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContextscala> val hiveContext = new HiveContext(sc)
Wed Jul 12 12:43:36 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Jul 12 12:43:36 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6d9a46d7scala> hiveContext.sql("select * from person")
res2: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]scala> hiveContext.sql("select * from person").show
+---+-----------+---+
| id|       name|age|
+---+-----------+---+
|  1|   zhangsan| 19|
|  2|       lisi| 20|
|  3|     wangwu| 28|
|  4|    zhaoliu| 26|
|  5|     tianqi| 24|
|  6|  chengnong| 55|
|  7|zhouxingchi| 58|
|  8|      mayun| 50|
|  9| yangliying| 30|
| 10|  lilianjie| 51|
| 11|zhanghuimei| 35|
| 12|       lian| 53|
| 13| zhangyimou| 54|
+---+-----------+---+scala>

bin/spark-sql \
–master spark://hadoop1:7077,hadoop2:7077 \
–executor-memory 1g \
–total-executor-cores 2 \
–driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar

5、启动spark-shell时指定mysql连接驱动位置

bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --executor-memory 1g --total-executor-cores 2 --driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar

5.1.使用sqlContext.sql调用HQL(这里是在spark-shell中执行的命令)

scala> sqlContext.sql("select * from person limit 2")
res0: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]scala> sqlContext.sql("select * from person limit 2").show
+---+--------+---+
| id|    name|age|
+---+--------+---+
|  1|zhangsan| 19|
|  2|    lisi| 20|
+---+--------+---+scala>

或使用org.apache.spark.sql.hive.HiveContext

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContextscala> val hiveContext = new HiveContext(sc)
这里是日志,略去
scala> hiveContext.sql("select * from person")
res2: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]scala> hiveContext.sql("select * from person").show
+---+-----------+---+
| id|       name|age|
+---+-----------+---+
|  1|   zhangsan| 19|
|  2|       lisi| 20|
|  3|     wangwu| 28|
|  4|    zhaoliu| 26|
|  5|     tianqi| 24|
|  6|  chengnong| 55|
|  7|zhouxingchi| 58|
|  8|      mayun| 50|
|  9| yangliying| 30|
| 10|  lilianjie| 51|
| 11|zhanghuimei| 35|
| 12|       lian| 53|
| 13| zhangyimou| 54|
+---+-----------+---+scala> 

Spark-Sql整合hive,在spark-sql命令和spark-shell命令下执行sql命令和整合调用hive相关推荐

  1. yum命令报错在linux下,执行yum命令报错

    执行yum命令报错信息如下: [root@webserver ~]# yum list rpmdb: Thread/process 26604/140491933587200 failed: Thre ...

  2. 在mysql命令行下执行sql文件

    ***********在mysql命令行下执行sql文件*********** C:\Windows\system32>cd E:\MySQL\mysql-5.7.16-winx64\bin / ...

  3. mysql存储过程语法错误1064_mysql,dos下执行SQL语句创建存储过程出错ERROR 1064 (42000):...

    update1.sql的内容为 DROP PROCEDURE IF EXISTS pcName; CREATE PROCEDURE pcName() BEGIN select 'a'; END; do ...

  4. mysql rds 定时执行_RDS下执行SQL小脚本

    RDS下执行SQL小脚本 #!/bin/bash echo ' =============================== - 生产操作谨慎执行 - - - 提前预置SQL文件 - - ===== ...

  5. conda添加清华镜像源在cmd环境下执行下列命令

    为conda添加清华镜像源在cmd环境下执行下列命令 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda ...

  6. Windows环境下执行hadoop命令出现系统找不到指定路径Error

    问题详情 比如在win下的cmd下执行如下命令,会出现 首先打了下java -version,对的呀. C:\Users\lenovo>java -version java version &q ...

  7. MySQL命令行下执行.sql脚本详解

    本文主要介绍一个在MySQL命令行下执行脚本文件的例子,通过这个例子让我们来了解一下在命令行下MySQL是怎样执行脚本的吧.现在我们开始介绍这一过程. 1.首先编写sql脚本,保存为的:book.sq ...

  8. oracle:使用cmd命令在远程oracle服务器上执行sql语句

    在自动化测试过程中有时候我们需要自动跑一些sql脚本来维护自动化测试数据库,那么这时候我们可以选择批处理命令来执行sql语句. 环境准备: 执行sql服务器需要安装有oracle客户端. 步骤如下: ...

  9. SQLCMD下执行SQL命令失败但没有任何错误提示

    转载于:https://www.xin3721.com/Articlesqldatabase/sql12441.html 今天使用SQLCMD执行SQL文件,将数据导入到SQL SERVER数据库中, ...

最新文章

  1. 不用3D建模软件,如何用数学公式创造一个女孩?会眨眼,有光影的那种
  2. 通过Webservice查询手机号码归属地
  3. 推荐几款软件和几个网站
  4. java双链表基本方法_Java数据结构之双端链表原理与实现方法
  5. Java并发教程–原子性和竞争条件
  6. python序列是几维_从一个1维的位数组获得一个特定的2维的1序列数组[Python] - python...
  7. 安卓 php环境 app,安卓系统lighttpd-php-mysql本地环境
  8. P1423 小玉在游泳(python3实现)
  9. 关于cookie使用的几个方法
  10. Extjs chart 总结 reload-chart.js 修改
  11. windows编程 使用C++实现多线程类
  12. 以写代学:python shutil模块
  13. Javascrpt测试
  14. nginx 按天分割日志
  15. 【转】密钥管理服务(KMS)
  16. 苏宁易购开放平台_前三季度线上规模突破2000亿,苏宁易购三季度盈利7.14亿
  17. tools:replace specified at line: for attribute android:appComponentFactory, but no new value specifi
  18. C语言malloc初始化问题
  19. HP小型机的信息的命令集
  20. MES解决方案,助力汽车零配件行业打造数字化工厂

热门文章

  1. 列表推导式与生成表达式的区别
  2. Django2中使用xadmin
  3. pyqt5讲解10:布局管理讲解大全
  4. VTK:图像加权和用法实战
  5. VTK:创建一个圆锥用法实战
  6. JavaScript实现设置或清除数字指定偏移量上的位setBit算法(附完整源码)
  7. OpenCASCADE:OCAF 使用
  8. boost::binary_search相关的测试程序
  9. boost::msm::mpl_graph::depth_first_search相关的测试程序
  10. boost::hana::integral_c用法的测试程序