1.拷贝hive-site.xml文件到spark的conf目录下
2.[hadoop@hadoop002 bin]$ ./spark-shell --master local[2] --jars ~/software/mysql-connector-java-5.1.47.jar 
    注意用5版本的mysql-connector-java

scala> spark.sql("show databases").show
+------------+
|databaseName|
+------------+
|     default|
|        test|
+------------+
scala> spark.sql("select *from test.wc").show
20/02/19 08:50:05 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
+-----------------+
|         sentence|
+-----------------+
|hello  hello   hello|
|     spark hadoop|
|             hive|
+-----------------+

3.另一种启动方式
[hadoop@hadoop002 bin]$ ./spark-sql --master local --jars ~/software/mysql-connector-java-5.1.47.jar --driver-class-path ~/software/mysql-connector-java-5.1.47.jar 
--driver-class-path 表明driver端也要需要该jar包服务。另一种方式是把jar包放$SPARK_HOME的lib下面,不过这样的话任何一个spark程序启动都会加载这个jar包。

spark-sql (default)> desc formatted wc;
20/02/19 21:08:05 INFO metastore.HiveMetaStore: 0: get_database: test
20/02/19 21:08:05 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_database: test
20/02/19 21:08:05 INFO metastore.HiveMetaStore: 0: get_table : db=test tbl=wc
20/02/19 21:08:05 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=test tbl=wc
20/02/19 21:08:05 INFO metastore.HiveMetaStore: 0: get_table : db=test tbl=wc
20/02/19 21:08:05 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=test tbl=wc
20/02/19 21:08:05 INFO codegen.CodeGenerator: Code generated in 181.19869 ms
col_name    data_type   comment
sentence    string  NULL# Detailed Table Information
Database    test
Table   wc
Owner   hadoop
Created Time    Sun Nov 10 16:53:07 CST 2019
Last Access Thu Jan 01 08:00:00 CST 1970
Created By  Spark 2.2 or prior
Type    MANAGED
Provider    hive
Table Properties    [transient_lastDdlTime=1573378511]
Statistics  36 bytes
Location    hdfs://hadoop002:8020/user/hive/warehouse/test.db/wc
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.TextInputFormat
OutputFormat    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties  [serialization.format=1]
Partition Provider  Catalog
Time taken: 0.511 seconds, Fetched 19 row(s)
20/02/19 21:08:05 INFO thriftserver.SparkSQLCLIDriver: Time taken: 0.511 seconds, Fetched 19 row(s)
spark-sql (default)>

4.thriftserver和beeline的使用
4.1启动长服务

[hadoop@hadoop002 sbin]$ ./start-thriftserver.sh --help
[hadoop@hadoop002 sbin]$ ./start-thriftserver.sh --jars ~/software/mysql-connector-java-5.1.47.jar
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/hadoop/app/spark-2.4.4-bin-2.6.0-cdh5.15.1/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop002.out
[hadoop@hadoop002 sbin]$ tail -200f /home/hadoop/app/spark-2.4.4-bin-2.6.0-cdh5.15.1/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop002.out
*****
20/02/19 09:37:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is started.
20/02/19 09:37:16 INFO service.AbstractService: Service:HiveServer2 is started.
20/02/19 09:37:16 INFO thriftserver.HiveThriftServer2: HiveThriftServer2 started
20/02/19 09:37:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@72c5064f{/sqlserver,null,AVAILABLE,@Spark}
20/02/19 09:37:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a0c04ab{/sqlserver/json,null,AVAILABLE,@Spark}
20/02/19 09:37:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5d9d8ecf{/sqlserver/session,null,AVAILABLE,@Spark}
20/02/19 09:37:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43cc7951{/sqlserver/session/json,null,AVAILABLE,@Spark}
20/02/19 09:37:17 INFO thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads

4.2 启动beeline,

[hadoop@hadoop002 spark-2.4.4-bin-2.6.0-cdh5.15.1]$ ./bin/beeline -u jdbc:hive2://hadoop002:10000
Connecting to jdbc:hive2://hadoop002:10000
20/02/19 09:48:58 INFO jdbc.Utils: Supplied authorities: hadoop002:10000
20/02/19 09:48:58 INFO jdbc.Utils: Resolved authority: hadoop002:10000
20/02/19 09:48:58 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://hadoop002:10000
Error: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------[hadoop@hadoop002 bin]$ ./beeline -n hadoop -u jdbc:hive2://hadoop002:10000
Connecting to jdbc:hive2://hadoop002:10000
20/02/19 10:19:18 INFO jdbc.Utils: Supplied authorities: hadoop002:10000
20/02/19 10:19:18 INFO jdbc.Utils: Resolved authority: hadoop002:10000
20/02/19 10:19:18 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://hadoop002:10000
Connected to: Spark SQL (version 2.4.4)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://hadoop002:10000> show databases;
+---------------+--+
| databaseName  |
+---------------+--+
| default       |
| test          |
+---------------+--+
2 rows selected (0.642 seconds)
0: jdbc:hive2://hadoop002:10000> use test;
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (0.05 seconds)
0: jdbc:hive2://hadoop002:10000> select * from wc;
+--------------------+--+
|      sentence      |
+--------------------+--+
| hello hello   hello  |
| spark hadoop       |
| hive               |
+--------------------+--+
3 rows selected (1.471 seconds)
0: jdbc:hive2://hadoop002:10000>



5.ThriftServer VS 例行的Spark Application
ThriftServer是一个长服务,7*24 小时运行,后者跑完就没有了
前者只有启动的时候申请资源,后者每次启动都要申请资源
前者多个资源提交的资源其他的资源可以共享,比如cache
6.通过代码操作Clinet

import java.sql.DriverManager/*** 通过JDBC连接Client访问数据*/
object JDBC2ThiftClientApp {def main(args: Array[String]): Unit = {Class.forName("org.apache.hive.jdbc.HiveDriver")val connection = DriverManager.getConnection("jdbc:hive2:hadoop002:10000")val pstm = connection.prepareStatement("select * from test.wc")val rs = pstm.executeQuery()while (rs.next()){println(rs.getObject(1))}}
}

7.Spark通过代码操作Hive

import java.util.Propertiesimport com.typesafe.config.ConfigFactory
import org.apache.spark.sql.{SaveMode, SparkSession}object HiveSourceApp {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().master("local").appName("HiveSourceApp").enableHiveSupport().getOrCreate()
//    spark.sql("show databases").show()//从MySQL中读取数据val config = ConfigFactory.load()val url = config.getString("db.default.url")val user = config.getString("db.default.user")val password = config.getString("db.default.password")val driver = config.getString("db.default.driver")val database = config.getString("db.default.database")val table = config.getString("db.default.table")//从mysql读取数据val connectionProperties = new Properties()connectionProperties.put("user", user)connectionProperties.put("password", password)//TODO 处理业务逻辑val jdbcDF = spark.read.jdbc(url, s"$database.$table", connectionProperties)jdbcDF.show()//将数据写入Hive中jdbcDF.write.mode(SaveMode.Append).saveAsTable("test.hive")spark.stop()}}

Spark对接Hive:整合Hive操作及函数相关推荐

  1. Spark与Iceberg整合查询操作-查询快照,表历史,data files Manifests 查询快照,时间戳数据...

    1.8.6 Spark与Iceberg整合查询操作 1.8.6.1 DataFrame API加载Iceberg中的数据 Spark操作Iceberg不仅可以使用SQL方式查询Iceberg中的数据, ...

  2. 数据湖(十四):Spark与Iceberg整合查询操作

    文章目录 Spark与Iceberg整合查询操作 一.DataFrame API加载Iceberg中的数据 二.查询表快照

  3. Hive常见查询操作与函数汇总

    目录 一.查询操作 1.基本查询(Like VS RLike) 2.Join语句 3.分组 4.排序 sort by 和 distribute by 6.分桶抽样 二.函数汇总 1.查询函数 行与列的 ...

  4. 【大数据入门核心技术-Tez】(三)Tez与Hive整合

    一.准备工作 1.Hadoop和Hive安装 [大数据入门核心技术-Hadoop](五)Hadoop3.2.1非高可用集群搭建 [大数据入门核心技术-Hadoop](六)Hadoop3.2.1高可用集 ...

  5. 【大数据开发】SparkSQL——Spark对接Hive、Row类、SparkSQL函数、UDF函数(用户自定义函数)、UDAF函数、性能调优、SparkSQL解决数据倾斜

    文章目录 一.Spark对接Hive准备工作 1.1 集群文件下载 1.2 导入依赖 1.3 打开集群metastore服务 二.Spark对接Hive 2.1 查询Hive 2.2 读取MySQL中 ...

  6. Spark SQL实战(08)-整合Hive

    1 整合原理及使用 Apache Spark 是一个快速.可扩展的分布式计算引擎,而 Hive 则是一个数据仓库工具,它提供了数据存储和查询功能.在 Spark 中使用 Hive 可以提高数据处理和查 ...

  7. beeline执行sql文件_【SparkSQL】介绍、与Hive整合、Spark的th/beeline/jdbc/thriftserve2、shell方式使用SQL...

    目录 一.Spark SQL介绍 SQL on Hadoop框架: 1)Spark SQL 2)Hive 3)Impala 4)Phoenix Spark SQL是用来处理离线数据的,他的编程模型是D ...

  8. 2.4-2.5、Hive整合(整合Spark、整合Hbase)、连接方式Cli、HiveServer和hivemetastore、Squirrel SQL Client等

    2.4其它整合 2.4.1Hive整合Spark Spark整合hive,需要将hive_home下的conf下的hive_site.xml放到spark_home下的conf目录下.(3台服务器都做 ...

  9. Hive基本操作,DDL操作(创建表,修改表,显示命令),DML操作(Load Insert Select),Hive Join,Hive Shell参数(内置运算符、内置函数)等

    1.  Hive基本操作 1.1DDL操作 1.1.1    创建表 建表语法 CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_nam ...

  10. Hive SQL操作与函数自定义(二)

    9 Operators and UDFs 9.1 内置运算符 9.1.1 关系运算符 操作符 运算对象的类型 描述 A <=> B ALL 都是NULL时,返回TRUE,有一为NULL时, ...

最新文章

  1. maven2学习总结(1,入门起步与实践)
  2. python解析原理_代码详解:Python虚拟环境的原理及使用
  3. 向linux内核增加新的系统调用,为linux内核添加新的系统调用
  4. java内存结构不包含堆,JVM之详细分析java内存结构模型
  5. JS 一个简单的隔行变色函数
  6. COGS 2507 零食店
  7. chrome 长截屏插件
  8. 英文java简历模板下载_java软件工程师英文简历模板下载
  9. 百分字符知识付费教程
  10. T-SNE可视化实现
  11. unity2D游戏案例-躲避怪云
  12. LoRa开发|LoRa无线传输技术介绍
  13. java作业的提交规范与要求
  14. 砸金蛋html5小游戏设计总结
  15. 华为u8500在usb模式下logcat无法打印信息
  16. SQL Server 2008 SP3简体中文版官方下载
  17. 对webkit-font-smoothing和-moz-osx-font-smoothing的理解
  18. Photoshop中蒙尘与划痕的使用和案例:蒙尘与划痕磨皮、去划痕
  19. Git 错误 Unable to create 'E:/xxx/.git/index.lock': File exists.的解决办法
  20. 【视觉定位UV】Mark点配置文件格式说明

热门文章

  1. ubuntu 14.04 从零开始安装caffe
  2. [附源码]Java计算机毕业设计SSM爱心宠物中心管理系统
  3. vs编译器警告(等级3) C4996
  4. 导出excel和xml和简单方法
  5. graphql-java-codegen - 基于模式驱动构建GraphQL应用程序 Kotlin
  6. Triconex 英维思 7400209-010 I/O 模块
  7. 更智能、更安全:戴尔推出全新商用PC产品组合
  8. Node.js快速排序
  9. c#数组转换成字符串
  10. 网上商城购物系统(1)