Worker启动Executor源码

给Executor划分好资源后，Worker就要按此来启动Executor了。资源划分完毕后，会返回每个Executor实得多少core的数组，然后就是循环可用的Worker节点，给Executor划分资源。

val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)// Now that we've decided how many cores to allocate on each worker, let's allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {//在worker中给Executor划分资源allocateWorkerResourceToExecutors(app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
}

根据数组，得知每个Executor要分几个core。循环去Worker节点挨个启动Executor

private def allocateWorkerResourceToExecutors(app: ApplicationInfo,assignedCores: Int,coresPerExecutor: Option[Int],worker: WorkerInfo): Unit = {// If the number of cores per executor is specified, we divide the cores assigned// to this worker evenly among the executors with no remainder.// Otherwise, we launch a single executor that grabs all the assignedCores on this worker.val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)//每个Executor要分配多少个coreval coresToAssign = coresPerExecutor.getOrElse(assignedCores)for (i <- 1 to numExecutors) {val exec: ExecutorDesc = app.addExecutor(worker, coresToAssign)//去worker中启动ExecutorlaunchExecutor(worker, exec)app.state = ApplicationState.RUNNING}
}

明确资源后，就给worker发送“LaunchExecutor”消息

//启动Executor
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)worker.addExecutor(exec)/***  获取Worker的通信邮箱，给Worker发送启动Executor【多少core和多少内存】*  在Worker中有receive 方法一直匹配 LaunchExecutor 类型*/worker.endpoint.send(LaunchExecutor(masterUrl,exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))exec.application.driver.send(ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}

worker的receive方法会接收并匹配到这条消息：

//启动Executor
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>if (masterUrl != activeMasterUrl) {logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")} else {try {logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))// Create the executor's working directoryval executorDir = new File(workDir, appId + "/" + execId)if (!executorDir.mkdirs()) {throw new IOException("Failed to create directory " + executorDir)}// Create local dirs for the executor. These are passed to the executor via the// SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the// application finishes.val appLocalDirs = appDirectories.getOrElse(appId, {val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)val dirs = localRootDirs.flatMap { dir =>try {val appDir = Utils.createDirectory(dir, namePrefix = "executor")Utils.chmod700(appDir)Some(appDir.getAbsolutePath())} catch {case e: IOException =>logWarning(s"${e.getMessage}. Ignoring this directory.")None}}.toSeqif (dirs.isEmpty) {throw new IOException("No subfolder can be created in " +s"${localRootDirs.mkString(",")}.")}dirs})appDirectories(appId) = appLocalDirs//创建ExecutorRunnerval manager = new ExecutorRunner(appId,execId,/*** appDesc 中有 Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",.......) 中* 第一个参数就是Executor类*/appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),cores_,memory_,self,workerId,host,webUi.boundPort,publicAddress,sparkHome,executorDir,workerUri,conf,appLocalDirs, ExecutorState.RUNNING)executors(appId + "/" + execId) = manager/*** 启动ExecutorRunner* 启动的就是 CoarseGrainedExecutorBackend 类*/manager.start()coresUsed += cores_memoryUsed += memory_sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))} catch {case e: Exception =>logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)if (executors.contains(appId + "/" + execId)) {executors(appId + "/" + execId).kill()executors -= appId + "/" + execId}sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,Some(e.toString), None))}}

创建ExecutorRunner并start。从appDesc中拿出Command对象，包装有CoarseGrainedExecutorBackend。会触发到CoarseGrainedExecutorBackend的main方法的执行：

    //run方法run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)

会向rpcEnv注册Executor的EndPoint：

//注册Executor的通信邮箱，会调用CoarseGrainedExecutorBackend的onstart方法
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))

必然会触发CoarseGrainedExecutorBackend#onStart方法：

override def onStart() {logInfo("Connecting to driver: " + driverUrl)//从RPC中拿到Driver的引用，给Driver反向注册ExecutorrpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>// This is a very fast action so we can use "ThreadUtils.sameThread"//拿到Driver的引用driver = Some(ref)/*** 给Driver反向注册Executor信息，这里就是注册给之前看到的 CoarseGrainedSchedulerBackend 类中的DriverEndpoint* DriverEndpoint类中会有receiveAndReply 方法来匹配RegisterExecutor*/ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))}(ThreadUtils.sameThread).onComplete {// This is a very fast action so we can use "ThreadUtils.sameThread"case Success(msg) =>// Always receive `true`. Just ignore itcase Failure(e) =>exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)}(ThreadUtils.sameThread)
}

它会拿到Driver的url、Driver的引用，给Driver发“RegisterExecutor”消息。就是向Driver去反向注册。Driver就是：

rpcEnv.setupEndpoint(ENDPOING_NAME,createDriverEndpoint) —>new DriverEndpoint

DriverEndpoint的receiveAndReply方法会接收到：

//反向注册的Executor
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>if (executorDataMap.contains(executorId)) {executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))context.reply(true)} else if (scheduler.nodeBlacklist != null &&scheduler.nodeBlacklist.contains(hostname)) {// If the cluster manager gives us an executor on a blacklisted node (because it// already started allocating those resources before we informed it of our blacklist,// or if it ignored our blacklist), then we reject that executor immediately.logInfo(s"Rejecting $executorId as it has been blacklisted.")executorRef.send(RegisterExecutorFailed(s"Executor is blacklisted: $executorId"))context.reply(true)} else {// If the executor's rpc env is not listening for incoming connections, `hostPort`// will be null, and the client connection should be used to contact the executor.val executorAddress = if (executorRef.address != null) {executorRef.address} else {context.senderAddress}logInfo(s"Registered executor $executorRef ($executorAddress) with ID $executorId")addressToExecutorId(executorAddress) = executorIdtotalCoreCount.addAndGet(cores)totalRegisteredExecutors.addAndGet(1)val data = new ExecutorData(executorRef, executorAddress, hostname,cores, cores, logUrls)// This must be synchronized because variables mutated// in this block are read when requesting executorsCoarseGrainedSchedulerBackend.this.synchronized {executorDataMap.put(executorId, data)if (currentExecutorIdCounter < executorId.toInt) {currentExecutorIdCounter = executorId.toInt}if (numPendingExecutors > 0) {numPendingExecutors -= 1logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")}}/*** 拿到Execuotr的通信邮箱，发送消息给ExecutorRef 告诉 Executor已经被注册。* 在 CoarseGrainedExecutorBackend 类中 receive方法一直监听有没有被注册，匹配上就会启动Executor**/executorRef.send(RegisteredExecutor)// Note: some tests expect the reply to come after we put the executor in the mapcontext.reply(true)listenerBus.post(SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))makeOffers()}

封装了一堆的对象后，要给Executor发“RegisteredExecutor”。Executor的Endpoint就是CoarseGrainedSchedulerBackend，receive方法会接收到：

//匹配上Driver端发过来的消息，已经接受注册Executor了，下面要启动Executor
case RegisteredExecutor =>logInfo("Successfully registered with driver")try {//下面创建Executor，Executor真正的创建Executor,Executor中有线程池用于task运行【Executor中89行】executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)} catch {case NonFatal(e) =>exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)}

Driver接收到后，会new Executor，它里面由线程池threadpool，供task运行使用

Worker启动Executor源码相关推荐

Java Executor源码解析(3)—ThreadPoolExecutor线程池execute核心方法源码【一万字】
基于JDK1.8详细介绍了ThreadPoolExecutor线程池的execute方法源码! 上一篇文章中,我们介绍了:Java Executor源码解析(2)-ThreadPoolExecutor ...
storm启动supervisor源码分析-supervisor.clj
storm启动supervisor源码分析-supervisor.clj supervisor是storm集群重要组成部分,supervisor主要负责管理各个"工作节点".sup ...
storm启动nimbus源码分析-nimbus.clj
storm启动nimbus源码分析-nimbus.clj nimbus是storm集群的"控制器",是storm集群的重要组成部分.我们可以通用执行bin/storm nimbus ...
【Android 插件化】VirtualApp 源码分析 ( 启动应用源码分析 | HomePresenterImpl 启动应用方法 | VirtualCore 启动插件应用最终方法 )
文章目录一.启动应用源码分析 1.HomeActivity 启动应用点击方法 2.HomePresenterImpl 启动应用方法 3.VirtualCore 启动插件应用最终方法一.启动应用源码 ...
Spring Boot 2.x 启动全过程源码分析（全）
上篇<Spring Boot 2.x 启动全过程源码分析(一)入口类剖析>我们分析了 Spring Boot 入口类 SpringApplication 的源码,并知道了其构造原理,这篇我 ...
Spring Boot 2.x 启动全过程源码分析（上）入口类剖析
转载自 Spring Boot 2.x 启动全过程源码分析(上)入口类剖析 Spring Boot 的应用教程我们已经分享过很多了,今天来通过源码来分析下它的启动过程,探究下 Spring Boo ...
Executor源码解读
Executor源码解读〇.[源码版本] jdk 1.8 一.不再显式创建线程 [举例1]代码示例二.不严格要求执行是异步的 [举例1]代码示例三.任务在调用者线程之外的某个线程中执行 [举例1 ...
idea用maven启动zookeeper源码
在执行官网的java例子的时候,发现根本无法启动,后来发现是我的zookeeper就没编译好?试一下 idea编译启动zookeeper源码修改配置文件 1 zoo.cfg 2 log4j.prop ...
linux显示启动logo源码分析以及修改显示logo
1.linux显示启动logo整个流程分析 (1)logo图片在内核源码中是以ppm格式的文件保存,在编译内核时会把ppm格式的文件自动转换成.c文件,在c文件中会构造一个struct linux_l ...

Worker启动Executor源码

Worker启动Executor源码相关推荐

最新文章

热门文章