【甘道夫】Spark1.3.0 Running Spark on YARN 官方文档精华摘要
欢迎转载,请注明出处:
![](/assets/blank.gif)
Property Name | Default | Meaning |
---|---|---|
spark.yarn.am.memory
|
512m |
Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m , 2g ). In cluster mode, use spark.driver.memory instead.
|
spark.driver.cores
|
1 |
Number of cores used by the driver in YARN cluster mode. Since the driver is run in the same JVM as the YARN Application Master in cluster mode, this also controls the cores used by the YARN AM. In client mode, usespark.yarn.am.cores to control the number of cores used by the YARN AM instead.
|
spark.yarn.am.cores
|
1 |
Number of cores to use for the YARN Application Master in client mode. In cluster mode, use spark.driver.cores instead.
|
spark.yarn.am.waitTime
|
100000 | In yarn-cluster mode, time in milliseconds for the application master to wait for the SparkContext to be initialized. In yarn-client mode, time for the application master to wait for the driver to connect to it. |
spark.yarn.submit.file.replication
|
The default HDFS replication (usually 3) | HDFS replication level for the files uploaded into HDFS for the application. These include things like the Spark jar, the app jar, and any distributed cache files/archives. |
spark.yarn.preserve.staging.files
|
false | Set to true to preserve the staged files (Spark jar, app jar, distributed cache files) at the end of the job rather than delete them. |
spark.yarn.scheduler.heartbeat.interval-ms
|
5000 | The interval in ms in which the Spark application master heartbeats into the YARN ResourceManager. |
spark.yarn.max.executor.failures
|
numExecutors * 2, with minimum of 3 | The maximum number of executor failures before failing the application. |
spark.yarn.historyServer.address
|
(none) | The address of the Spark history server (i.e. host.com:18080). The address should not contain a scheme (http://). Defaults to not being set since the history server is an optional service. This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI. |
spark.yarn.dist.archives
|
(none) | Comma separated list of archives to be extracted into the working directory of each executor. |
spark.yarn.dist.files
|
(none) | Comma-separated list of files to be placed in the working directory of each executor. |
spark.executor.instances
|
2 |
The number of executors. Note that this property is incompatible withspark.dynamicAllocation.enabled .
|
spark.yarn.executor.memoryOverhead
|
executorMemory * 0.07, with minimum of 384 | The amount of off heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). |
spark.yarn.driver.memoryOverhead
|
driverMemory * 0.07, with minimum of 384 | The amount of off heap memory (in megabytes) to be allocated per driver in cluster mode. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%). |
spark.yarn.am.memoryOverhead
|
AM memory * 0.07, with minimum of 384 |
Same as spark.yarn.driver.memoryOverhead , but for the Application Master in client mode.
|
spark.yarn.queue
|
default | The name of the YARN queue to which the application is submitted. |
spark.yarn.jar
|
(none) | The location of the Spark jar file, in case overriding the default location is desired. By default, Spark on YARN will use a Spark jar installed locally, but the Spark jar can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to a jar on HDFS, for example, set this configuration to "hdfs:///some/path". |
spark.yarn.access.namenodes
|
(none) | A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters. |
spark.yarn.appMasterEnv.[EnvironmentVariableName]
|
(none) |
Add the environment variable specified by EnvironmentVariableName to the Application Master process launched on YARN. The user can specify multiple of these and to set multiple environment variables. In yarn-cluster mode this controls the environment of the SPARK driver and in yarn-client mode it only controls the environment of the executor launcher.
|
spark.yarn.containerLauncherMaxThreads
|
25 | The maximum number of threads to use in the application master for launching executor containers. |
spark.yarn.am.extraJavaOptions
|
(none) | A string of extra JVM options to pass to the YARN Application Master in client mode. In cluster mode, use spark.driver.extraJavaOptions instead. |
spark.yarn.maxAppAttempts
|
yarn.resourcemanager.am.max-attempts in YARN | The maximum number of attempts that will be made to submit the application. It should be no larger than the global number of max attempts in the YARN configuration. |
- yarn-client模式:在yarn-client模式中,driver运行在client进程中,Application Master(Application Master是YARN架构的一部分,是运行在YARN中各个应用程序的调度器)仅仅用于向YARN申请资源(client在控制台可以看到程序打印输出)。
- yarn-cluster模式:在yarn-cluster模式中,Spark driver运行在 Application Master进程中(client控制台看不到程序打印的输出)。
![](/assets/blank.gif)
$ ./bin/spark-submit --class my.main.Class \--master
yarn-cluster
\
--jars my-other-jar.jar,my-other-other-jar.jar
my-main-jar.jarapp_arg1 app_arg2
【甘道夫】Spark1.3.0 Running Spark on YARN 官方文档精华摘要相关推荐
- 【甘道夫】Spark1.3.0 Cluster Mode Overview 官方文档精华摘要
引言 由于工作需要,即将拥抱Spark,曾经进行过相关知识的学习,现在计划详细读一遍最新版本Spark1.3的部分官方文档,一是复习,二是了解最新进展,三是为公司团队培训做储备. 欢迎转载,请注明出处 ...
- 【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详细解释
环境: hadoop2.2.0 hive0.13.1 Ubuntu 14.04 LTS java version "1.7.0_60" Oracle10g ***欢迎转载.请注明来 ...
- 【甘道夫】Win7x64环境下编译Apache Hadoop2.2.0的Eclipse插件
目标: 编译Apache Hadoop2.2.0在win7x64环境下的Eclipse插件 环境: win7x64家庭普通版 eclipse-jee-kepler-SR1-win32-x86_64.z ...
- 【甘道夫】Win7x64环境下编译Apache Hadoop2.2.0的Eclipse小工具
目标: 编译Apache Hadoop2.2.0在win7x64环境下的Eclipse插件 环境: win7x64家庭普通版 eclipse-jee-kepler-SR1-win32-x86_64.z ...
- 【甘道夫】HBase(0.96以上版本)过滤器Filter详解及实例代码
说明: 本文参考官方Ref Guide,Developer API和众多博客,并结合实测代码编写,详细总结HBase的Filter功能,并附上每类Filter的相应代码实现. 本文尽量遵从Ref Gu ...
- 【甘道夫】MapReduce实现矩阵乘法--实现代码
之前写了一篇分析MapReduce实现矩阵乘法算法的文章: [甘道夫]Mapreduce实现矩阵乘法的算法思路 为了让大家更直观的了解程序运行,今天编写了实现代码供大家參考. 编程环境: java v ...
- 用Python与Watson,将《魔戒》甘道夫的性格可视化!
全文共4301字,预计学习时长9分钟 图源Unsplash,由Marko Blažević提供 著名心理学家詹姆斯· 彭内贝克曾说:"仔细观察人们通过语言表达思想的方式,会感受到他们的性格特 ...
- 【甘道夫】Hadoop2.4.1尝鲜部署+完整版配置文件
引言 转眼间,Hadoop的stable版本已经升级到2.4.1了,社区的力量真是强大!3.0啥时候release呢? 今天做了个调研,尝鲜了一下2.4.1版本的分布式部署,包括NN HA(目前已经部 ...
- 【甘道夫】Hive扩展GIS函数
阶段一:编译函数包 基于 https://github.com/Esri/spatial-framework-for-hadoop 项目编译产出两个jar包: spatial-sdk-hive-2.1 ...
最新文章
- hadoop优化之操作系统优化
- 【问题收集·中级】关于XMPP使用Base传送图片
- html 图片布局,CSS 布局图片
- MyBatis中针对if-test的参数为指定值的xml写法
- OpenCV腐蚀和膨胀Eroding and Dilating
- oracle之单行函数之分组函数之课后练习
- 教你29招,让你在社交,职场上人人对你刮目相看
- android studio使用ndk,jni随记
- 计算机 图论基础知识,计算机基础知识
- Gantt - attachEvent事件监听 - 无参数事件
- 计算机并口回路测试工具,COM口和LPT口回路环的制作与CheckIT3.0测试方法
- 第一章第4节-GIS平台
- dnsdhcp服务器实验原理,DHCP服务器配置实验报告.doc
- 灰度共生矩阵纹理特征提取matlab,灰度共生矩阵纹理特征提取的Matlab实现
- Python语法基础14 pickle与json模块 异常处理
- 谈谈你对Spring 的理解
- STM32F105替换为AT32F403A注意事项
- 随笔杂记(十)——C++:C4996报错解决方法
- 浅谈微前端在滴滴车服中的应用实践
- TensorFlow学习笔记(1)--TensorFlow简介,常用基本操作
热门文章
- Maven - StackOverflowError
- 集合泛型不匹配导致的ClassCastException异常解决
- Opera 推出 Opera One,将取代 Opera 浏览器
- Vivado使用技巧(15):DRC设计规则检查
- Hibernate的DAO实现
- 希尔顿集团推出面向社会的零工模式;玫琳凯中国市场电商销售占比已达100% | 美通企业日报...
- event_base
- 一个特立独行的普通人
- 微信转服服务器,王者荣耀qq转移到微信可以吗 跨平台转区规则介绍
- RS|哨兵二号(.SAFE格式)转tif格式