最近由于项目需要在研究spark相关的内容,形成了一些技术性文档,发布这记录下,懒得翻译了。

There are some spaces the official documents didn't explain very clearly, especially on some details. Here are given some more explanations base on the practices  I did  and the source codes I read these days.

(The official document link is http://spark.apache.org/docs/latest/job-scheduling.html)

  1. There are two different schedulers in current spark implementation, FIFO is the default setting and the initial way that spark implement.
  2. Both FIFO and FAIR schedulers can support the basic functionality that multiple parallel jobs run simultaneously, the prerequisite is that they are submitted from separate threads. (i.e., in single thread, the jobs are executed in order)
  3. In FIFO Scheduler, the jobs which are submitted earlier has higher priority and possibility than those later jobs. But it doesn't mean that the first job will execute first, it is also possible that later jobs run before the earlier ones if the resources of the whole cluster are not occupied. However,  the FIFO scheduler will cause the worst case: if the first jobs are large, the later jobs maybe suffer significant delay.
  4. The FAIR Scheduler is the way corresponding to Hadoop FAIR scheduler and enhancement for FIFO. In FIFO fashion, there is only one factor Priority will be considered in SchedulableQueue; While in FAIR fashion, more factors will be considered including minshare, runningtasks, weight (You can reference the code below if interest).Similarly, the jobs don't always run by following the rules by FairSchedulingAlgorithm strictly, while as a whole, the FAIR scheduler really alleviate largely the delay time for small jobs by adjusting the parameters which were delayed significantly in FIFO fashion in my observation through the concurrent JMeter tests。

private[spark] class FIFOSchedulingAlgorithm extends SchedulingAlgorithm {override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {val priority1 = s1.priorityval priority2 = s2.priorityvar res = math.signum(priority1 - priority2)if (res == 0) {val stageId1 = s1.stageIdval stageId2 = s2.stageIdres = math.signum(stageId1 - stageId2)}if (res < 0) {true} else {false}}
}
private[spark] class FairSchedulingAlgorithm extends SchedulingAlgorithm {override def comparator(s1: Schedulable, s2: Schedulable): Boolean = {val minShare1 = s1.minShareval minShare2 = s2.minShareval runningTasks1 = s1.runningTasksval runningTasks2 = s2.runningTasksval s1Needy = runningTasks1 < minShare1val s2Needy = runningTasks2 < minShare2val minShareRatio1 = runningTasks1.toDouble / math.max(minShare1, 1.0).toDoubleval minShareRatio2 = runningTasks2.toDouble / math.max(minShare2, 1.0).toDoubleval taskToWeightRatio1 = runningTasks1.toDouble / s1.weight.toDoubleval taskToWeightRatio2 = runningTasks2.toDouble / s2.weight.toDoublevar compare: Int = 0if (s1Needy && !s2Needy) {return true} else if (!s1Needy && s2Needy) {return false} else if (s1Needy && s2Needy) {compare = minShareRatio1.compareTo(minShareRatio2)} else {compare = taskToWeightRatio1.compareTo(taskToWeightRatio2)}if (compare < 0) {true} else if (compare > 0) {false} else {s1.name < s2.name}}
 

  5.The pools in FIFO and FAIR schedulers

转载于:https://www.cnblogs.com/taxuexunmei/p/4991250.html

Spark Job Scheduling相关推荐

  1. spark性能优化 -- spark工作原理

    从本篇文章开始,将开启 spark 学习和总结之旅,专门针对如何提高 spark 性能进行总结,力图总结出一些干货. 无论你是从事算法工程师,还是数据分析又或是其他与数据相关工作,利用 spark 进 ...

  2. spark总结——转载

    转载自:    spark总结 第一个Spark程序 /** * 功能:用spark实现的单词计数程序 * 环境:spark 1.6.1, scala 2.10.4 */ // 导入相关类库impor ...

  3. sparkcore分区_Spark学习:Spark源码和调优简介 Spark Core (二)

    本文基于 Spark 2.4.4 版本的源码,试图分析其 Core 模块的部分实现原理,其中如有错误,请指正.为了简化论述,将部分细节放到了源码中作为注释,因此正文中是主要内容. 第一部分内容见: S ...

  4. Spark Streaming实践和优化

    2019独角兽企业重金招聘Python工程师标准>>> Spark Streaming实践和优化 博客分类: spark 在流式计算领域,Spark Streaming和Storm时 ...

  5. Spark源码阅读02-Spark核心原理之作业执行原理

    概述 Spark的作业调度主要是指基于RDD的一系列操作构成的一个作业,在Executor中执行的过程.其中,在Spark作业调度中最主要的是DAGScheduler和TaskScheduler两个调 ...

  6. Spark源码分析 – DAGScheduler

    DAGScheduler的架构其实非常简单, 1. eventQueue, 所有需要DAGScheduler处理的事情都需要往eventQueue中发送event 2. eventLoop Threa ...

  7. Spark学习之路 (十五)SparkCore的源码解读(一)启动脚本

    讨论QQ:1586558083 目录 一.启动脚本分析 1.1 start-all.sh 1.2 start-master.sh 1.3 spark-config.sh(1.2的第5步) 1.4 lo ...

  8. Spark Streaming学习笔记

    特点: Spark Streaming能够实现对实时数据流的流式处理,并具有很好的可扩展性.高吞吐量和容错性. Spark Streaming支持从多种数据源提取数据,如:Kafka.Flume.Tw ...

  9. Spark详解(六):Spark集群资源调度算法原理

    1. 应用程序之间 在Standalone模式下,Master提供里资源管理调度功能.在调度过程中,Master先启动等待列表中应用程序的Driver,这个Driver尽可能分散在集群的Worker节 ...

最新文章

  1. vue判断离开当前页面_js监听用户进入和离开当前页面
  2. android 8.0 l2tp问题,【Win】使用L2TP出現809錯誤
  3. clear java_Java ConcurrentLinkedDeque clear()用法及代码示例
  4. 高中物理公式、规律汇编表
  5. ABAP中的动态运算函数
  6. 重磅:微信小程序开放公测了!
  7. wordpress 调用css,WordPress调用CSS最常用的方法有哪些?
  8. 智慧城市近两年来受到国家高度重视
  9. 2015年全国大学生电子设计竞赛A题(双向DC-DC变换器)训练总结(硬件部分)
  10. discuz 登录代码
  11. html中px em pt区别介绍
  12. 我的python笔记06
  13. 混乱的代码是技术债吗
  14. JS时间戳、日期互相转换
  15. 试分别简述udp和tcp的特点_读朱红群《余姚音系简述》
  16. 天龙八部TLBB系列 - 网单服务端Public/Data目录文件说明
  17. JADE学习笔记2 :Agent的创建和运行
  18. 2人找计算机工作5分钟英语对话,求一个英文两人对话,关于租房的,五分钟,大学六级水平?...
  19. java计算机毕业设计BS高校教师考勤系统MyBatis+系统+LW文档+源码+调试部署
  20. 真正爱你的男孩是这样的

热门文章

  1. linux 用户及权限管理
  2. flask send_filesend_from_directory
  3. 镜像迁移到registry_数据库迁移了解一下
  4. 职业经理人必读知识:36页SWOT全面解读,有效提升分析能力
  5. 支持的vCenter Server High Availability选项(2096800)
  6. 用vbs运行CMD不显示窗口的方法汇总
  7. Java Web学习总结(15)——JSP指令及使用相关总结
  8. Nginx学习总结(3)——Nginx配置及应用场景之高级配置
  9. 腾跃计算机二级vfp培训,计算机二级:在VFP中实现进度条
  10. java 任务_Java-定时任务