1 Spark 的4种运行模式

不管使用寿命模式,Spark 应用程序的代码是不变的,只需要在提交的时候通过 --master参数来指定

  1. Local,开发时使用
  2. Standalone,Spark自带的,如果一个集群是 Standalone ,那么就需要在多台机器同时部署Spark环境;
  3. YARN:建议在生产中使用;
  4. Mesos

1.1 概述

  • Spark 支持可插拔的集群管理模式;
  • 对于YARN,Spark Application 仅仅是一个客户端;

1.2 Spark on YARN 的模式

1.2.1 client 模式

  • Driver 运行在 client 端(提交 Spark 作业的机器)
  • Client 会和请求到的 Container 进行通信来完成作业的调度和执行,Client 不能退出;
  • 日志在控制台输出,便于测试

1.2.2 cluster 模式

  • Driver 运行在 Application Master;
  • Client 只要提交完作业之后就可以关掉,因为作业已经在 YARN 上运行
  • 日志是在终端看不到的,因为日志在Driver上,只能通过 yarn logs -applicationId <app ID>

1.3 设置 HADOOP_CONF_DIR 或者 YARN_CONF_DIR

配置方法有以下几种:

  1. export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/etc/hadoop
  2. spark-env.sh

1.4 测试

1.4.1 启动YARN

[hadoop@node1 ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
18/11/16 20:36:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [node1]
node1: starting namenode, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-node1.out
node2: starting datanode, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-node2.out
node3: starting datanode, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-node3.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-node1.out
18/11/16 20:36:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-node1.out
node2: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-node2.out
node3: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-node3.out
[hadoop@node1 ~]$

http://node1:8088/cluster

1.4.2 提交

  • client 模式
[hadoop@node1 spark-2.1.3-bin-2.6.0-cdh5.7.0]$ ./bin/spark-submit \
>   --class org.apache.spark.examples.SparkPi \
>   --master yarn \
>   --executor-memory 1G \
>   --num-executors 1 \
>   ./examples/jars/spark-examples_2.11-2.1.3.jar \
>   5
18/11/16 20:49:35 INFO spark.SparkContext: Running Spark version 2.1.3
18/11/16 20:49:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/16 20:49:36 INFO spark.SecurityManager: Changing view acls to: hadoop
18/11/16 20:49:36 INFO spark.SecurityManager: Changing modify acls to: hadoop
18/11/16 20:49:36 INFO spark.SecurityManager: Changing view acls groups to:
18/11/16 20:49:36 INFO spark.SecurityManager: Changing modify acls groups to:
  • cluster 模式
[hadoop@node1 spark-2.1.3-bin-2.6.0-cdh5.7.0]$ ./bin/spark-submit \
>   --class org.apache.spark.examples.SparkPi \
>   --master yarn-cluster \
>   --executor-memory 1G \
>   --num-executors 1 \
>   ./examples/jars/spark-examples_2.11-2.1.3.jar \
>   5
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/11/16 20:53:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/16 20:53:19 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.30.131:8032
18/11/16 20:53:19 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
............................
18/11/16 20:53:38 INFO yarn.Client: Application report for application_1542371790854_0006 (state: RUNNING)
18/11/16 20:53:39 INFO yarn.Client: Application report for application_1542371790854_0006 (state: RUNNING)
18/11/16 20:53:40 INFO yarn.Client: Application report for application_1542371790854_0006 (state: RUNNING)
18/11/16 20:53:41 INFO yarn.Client: Application report for application_1542371790854_0006 (state: RUNNING)
18/11/16 20:53:42 INFO yarn.Client: Application report for application_1542371790854_0006 (state: RUNNING)
18/11/16 20:53:43 INFO yarn.Client: Application report for application_1542371790854_0006 (state: RUNNING)
18/11/16 20:53:44 INFO yarn.Client: Application report for application_1542371790854_0006 (state: FINISHED)
18/11/16 20:53:44 INFO yarn.Client: client token: N/Adiagnostics: N/AApplicationMaster host: 192.168.30.133ApplicationMaster RPC port: 0queue: root.hadoopstart time: 1542372803673final status: SUCCEEDEDtracking URL: http://node1:8088/proxy/application_1542371790854_0006/Auser: hadoop
18/11/16 20:53:44 INFO util.ShutdownHookManager: Shutdown hook called
18/11/16 20:53:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-619e92b6-4fb4-47ac-ab8f-4836ccf9d086

https://spark.apache.org/docs/2.1.3/running-on-yarn.html

[hadoop@node1 ~]$ yarn logs -applicationId application_1542371790854_0006
18/11/16 20:58:55 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.30.131:8032
18/11/16 20:58:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/tmp/logs/hadoop/logs/application_1542371790854_0006does not exist.
Log aggregation has not completed or is not enabled.
[hadoop@node1 ~]$ 

Spark SQL 笔记(16)—— Spark on YARN相关推荐

  1. Spark学习笔记(8)---Spark Streaming学习笔记

    Spark Streaming学习笔记 同Spark SQL一样,Spark Streaming学习也是放在了github https://github.com/yangtong123/RoadOfS ...

  2. Spark学习笔记(7)---Spark SQL学习笔记

    Spark SQL学习笔记 Spark SQL学习笔记设计到很多代码操作,所以就放在github, https://github.com/yangtong123/RoadOfStudySpark/bl ...

  3. Cris 的 Spark SQL 笔记

    一.Spark SQL 概述 1.1 什么是Spark SQL Spark SQL 是 Spark 用来处理结构化数据的一个模块,它提供了2个编程抽象: DataFrame 和DataSet,并且作为 ...

  4. spark sql uv_使用Spark Streaming SQL进行PV/UV统计

    作者:关文选,花名云魄,阿里云E-MapReduce 高级开发工程师,专注于流式计算,Spark Contributor 1.背景介绍 PV/UV统计是流式分析一个常见的场景.通过PV可以对访问的网站 ...

  5. spark sql uv_使用Spark Streaming SQL进行PV/UV统计-阿里云开发者社区

    作者:关文选,花名云魄,阿里云E-MapReduce 高级开发工程师,专注于流式计算,Spark Contributor 1.背景介绍 PV/UV统计是流式分析一个常见的场景.通过PV可以对访问的网站 ...

  6. Spark SQL入门示例

    pom <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http:// ...

  7. 「Spark从入门到精通系列」4.Spark SQL和DataFrames:内置数据源简介

    来源 |  Learning Spark Lightning-Fast Data Analytics,Second Edition 作者 | Damji,et al. 翻译 | 吴邪 大数据4年从业经 ...

  8. 三万字,Spark学习笔记

    Spark 基础 Spark特性 Spark使用简练优雅的Scala语言编写,基于Scala提供了交互式编程体验,同时提供多种方便易用的API.Spark遵循"一个软件栈满足不同应用场景&q ...

  9. Spark SQL玩起来

    标签(空格分隔): Spark [toc] 前言 Spark SQL的介绍只包含官方文档的Getting Started.DataSource.Performance Tuning和Distribut ...

最新文章

  1. JavaScript中正则表达式学习(一)
  2. 今晚8点免费直播 | 详解知识图谱关键技术与应用案例
  3. 面试官:说说RabbitMQ 消费端限流、TTL、死信队列
  4. dirname和basename命令
  5. 华为鲲鹏弹性云服务器KM1_#化鲲为鹏,我有话说# 鲲鹏弹性云服务器配置 Tomcat...
  6. Codeforces Round #703 (Div. 2) 题解
  7. 问道五周年服务器维护公告,问道五周年 欢乐嘉年华
  8. vivado中如何读取十进制小数_二进制十进制间小数怎么转换,原来是这样的
  9. 初三下半年应该怎样合理的学习?
  10. EditPlus使用编辑Object C
  11. win7无线连接服务器,win7/8无线网络连接受限制怎么办?
  12. Windows取证一
  13. Bad config encountered during initialization: No such notebook dir:
  14. java 获取map keys_Java ConcurrentHashMap keys()用法及代码示例
  15. latex大括号公式编辑
  16. 用Python制作可视化GUI界面,一键实现证件照背景颜色的替换
  17. 多传感器融合标定方法汇总
  18. 语音识别-人工智能实验室旗下语音识别频道,汇集最新最全的语音识别新闻及资讯,让您掌握语音识别第一手的资讯-中国人工智能网-Powered by www.AiLab.cn
  19. 矽力杰SY8088国产代替料RY3408
  20. 消消乐php源码,手游泡泡消消乐设计(内附代码)

热门文章

  1. 【文献翻译】Network Security Entity Recognition Methods Based on the Deep Neural Network
  2. Hadoop —— 漫画图解hdfs读、写、容错、副本机制
  3. 正则校验:微信号,qq号,邮箱
  4. android使用ContentProvider初始化sdk,初始化时机
  5. Tkinter 1. 基本介绍
  6. 熊孩子乱敲键盘攻破linux桌面,“熊孩子”乱敲键盘攻破了Linux桌面,怎么做到的?...
  7. Google Alphabet
  8. 美团和大众点评早期分别以交易和用户评价进军团购行业
  9. 写给程序猿的把妹指南
  10. DNS域名解析服务(正向解析)