Spark学习笔记之-Spark-Standalone下driver和executor分配

来源：互联网发布：php获取ip地理位置编辑：程序博客网时间：2024/06/08 00:12

看了看spark-standalone的资源分配过程，还是记录一下吧，久了回顾一下。

Standalone模式下存在的角色。

Client：客户端进程，负责提交作业到Master。

Master：Standalone模式中主控节点，负责接收Client提交的作业，管理Worker，并命令Worker启动Driver和Executor。

Worker：Standalone模式中slave节点上的守护进程，负责管理本节点的资源，定期向Master汇报心跳，接收Master的命令，启动Driver和Executor。

Driver：一个Spark作业运行时包括一个Driver进程，也是作业的主进程，负责作业的解析、生成Stage并调度Task到Executor上。包括DAGScheduler，TaskScheduler。

Executor：即真正执行作业的地方，一个集群一般包含多个Executor，每个Executor接收Driver的命令Launch Task，一个Executor可以执行一到多个Task。

所谓资源就主要是指application在works上的mem，cores的计算分配

Application：带有自己需要的mem和cpu资源量，会在master里排队，最后被分发到worker上执行。app的启动是去各个worker遍历，获取可用的cpu，然后去各个worker launch executor。

Worker：每台slave起一个，默认或被设置cpu和mem数，并在内存里做加减维护资源剩余量。Worker同时负责拉起本地的executor backend，即执行进程。

Master：接受Worker、app的注册，为app执行资源分配。Master和Worker本质上都是一个带Actor的进程。

具体整个架构和过程觉得这个写的挺详细的：http://www.kuqin.com/shuoit/20150213/344838.html

此处主要说下资源分配吧（spark-1.4版本）

file:Master.scala中master接受application注册，分配driver和executors

    case RegisterApplication(description) => {      if (state == RecoveryState.STANDBY) {        // ignore, don't send response      } else {        logInfo("Registering app " + description.name)        val app = createApplication(description, sender)        registerApplication(app)        logInfo("Registered app " + description.name + " with ID " + app.id)        persistenceEngine.addApplication(app)        sender ! RegisteredApplication(app.id, masterUrl)        schedule()//进入资源分配入口      }    }

来看下具体过程吧function：schedule（）

  /**   * Schedule the currently available resources among waiting apps. This method will be called   * every time a new app joins or resource availability changes.   */  private def schedule(): Unit = {    if (state != RecoveryState.ALIVE) { return }    // Drivers take strict precedence over executors    val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers随机找出几个workers    for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {      for (driver <- waitingDrivers) {//可能有多个drivers等待启动        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {//看一下这个worker上的资源是否符合要求，mem，cores          launchDriver(worker, driver)//在此worker启动driver          waitingDrivers -= driver        }      }    }    startExecutorsOnWorkers()//计算和分配executors入口  }

继续func：starExecutorsOnWorkers（）

  /**   * Schedule and launch executors on workers   */  private def startExecutorsOnWorkers(): Unit = {    // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app    // in the queue, then the second app, etc.    for (app <- waitingApps if app.coresLeft > 0) {      val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor      // Filter out workers that don't have enough resources to launch an executor      val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)//过滤正常运行中的workers        .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&//此worker的内存是否符合app指定的内存          worker.coresFree >= coresPerExecutor.getOrElse(1))//此worker剩余核数是否符合app指定的要求，默认是1个core        .sortBy(_.coresFree).reverse      val assignedCores = Master.scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)//遍历各个worker，进行资源分配下面具体看      // Now that we've decided how many cores to allocate on each worker, let's allocate them      for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {        allocateWorkerResourceToExecutors(//申请资源启动executor          app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))      }    }  }

研究下这里

val assignedCores = Master.scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

这里返回的是符合资源要求的worker和其具体分配多少核。具体看下scheduleExecutorsOnWorkers吧

  def scheduleExecutorsOnWorkers(      app: ApplicationInfo,      usableWorkers: Array[WorkerInfo],      spreadOutApps: Boolean): Array[Int] = {    // If the number of cores per executor is not specified, then we can just schedule    // 1 core at a time since we expect a single executor to be launched on each worker    val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)    val memoryPerExecutor = app.desc.memoryPerExecutorMB    val numUsable = usableWorkers.length    val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker用于记录每个符合资源要求的worker能分配的核数    val assignedMemory = new Array[Int](numUsable) // Amount of memory to give to each worker    var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)//这里有2种情况，1:app所需要的核数<workers上剩余的cores. 2:则相反。<span style="font-family: Arial, Helvetica, sans-serif;">详见注解1，详见注解1，详见注解1，重要事情来三遍！</span>

    var freeWorkers = (0 until numUsable).toIndexedSeq    def canLaunchExecutor(pos: Int): Boolean = {      usableWorkers(pos).coresFree - assignedCores(pos) >= coresPerExecutor &&        usableWorkers(pos).memoryFree - assignedMemory(pos) >= memoryPerExecutor    }    while (coresToAssign >= coresPerExecutor && freeWorkers.nonEmpty) {//遍历每个worker进行分配，<span style="font-family: Arial, Helvetica, sans-serif;">coresToAssign是app还需多少要分配（或workers上还有多少可以分配）</span>

      freeWorkers = freeWorkers.filter(canLaunchExecutor)      freeWorkers.foreach { pos =>        var keepScheduling = true        while (keepScheduling && canLaunchExecutor(pos) && coresToAssign >= coresPerExecutor) {          coresToAssign -= coresPerExecutor          assignedCores(pos) += coresPerExecutor          // If cores per executor is not set, we are assigning 1 core at a time          // without actually meaning to launch 1 executor for each core assigned          if (app.desc.coresPerExecutor.isDefined) {            assignedMemory(pos) += memoryPerExecutor          }          // Spreading out an application means spreading out its executors across as          // many workers as possible. If we are not spreading out, then we should keep          // scheduling executors on this worker until we use all of its resources.          // Otherwise, just move on to the next worker.          if (spreadOutApps) {<span style="font-family: Arial, Helvetica, sans-serif;">//这里是</span><span style="font-family: Arial, Helvetica, sans-serif;">spreadOutApps为true（默认）情况，尽量在多个worker上启动executor、</span>            keepScheduling = false    <span style="font-family: Arial, Helvetica, sans-serif;">//相反如果为false，则根据指定的app的总核数，尽量分配单个节点的可用cores.</span>          }                               }      }    }    assignedCores//这里返回的就是保持可分配的节点worker以及他能分配多少个cores供给app.  }}

接下来就是根据计算出来的资源，开始启动吧：

for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {        allocateWorkerResourceToExecutors(//申请资源启动executor          app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))      }    }

  /**   * Allocate a worker's resources to one or more executors.   * @param app the info of the application which the executors belong to   * @param assignedCores number of cores on this worker for this application   * @param coresPerExecutor number of cores per executor   * @param worker the worker info   */  private def allocateWorkerResourceToExecutors(      app: ApplicationInfo,      assignedCores: Int,      coresPerExecutor: Option[Int],      worker: WorkerInfo): Unit = {    // If the number of cores per executor is specified, we divide the cores assigned    // to this worker evenly among the executors with no remainder.    // Otherwise, we launch a single executor that grabs all the assignedCores on this worker.    val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)//看下在此worker上启动executor的个数.详见注解2,详见注解2,详见注解2!    val coresToAssign = coresPerExecutor.getOrElse(assignedCores)    for (i <- 1 to numExecutors) {      val exec = app.addExecutor(worker, coresToAssign)      launchExecutor(worker, exec)//在worker上启动executor      app.state = ApplicationState.RUNNING    }  }

上面就是大概的资源分配过程。下面看一下这些参数吧。

--total-executor-cores NUM Total cores for all executors.

--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,or all available cores on the worker in standalone mode)

之前总搞不明白他俩之间如何搭配。。。先看下代码中的注解，然后看下几个例子

注解1：这里的

var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)，是取两者小者，有两种情况：

1.app所需要的cores少于workers上的cores，资源丰富啊，那好啊，挨个遍历worker最少分配指定核数--executor-cores，直到分配到了总数total-executor-cores为止。

例：5个workers.每个10cores. 命令：--executor-cores 2  total-executor-cores 6

那么就是每个workers都满足，挨个遍历，发现遍历到第三个时候总的cores达到6了，则停止，在3个workers上分别启动一个executor，共3个executors

ps：如果spreadOutApps= false了，那么直接在第一个worker上全部分完总共的6个cores了。即启动一个executor。

2.app所需要的cores大于workers上的cores，资源不足不够分配total-executor-cores这么多了，那么就尽量在满足最低--executor-cores个数的worker上启动executor吧。

例：5个workers.有2台分别剩余3cores.3台分别剩余1个cores. 命令：--executor-cores 2  total-executor-cores 30
显然5个worker上一共2X3 + 3X1 = 9个 < 30个的期望数字。那么就是在2台满足最低coresPerExecutor=2cores的worker上分别启动executor，共2个executors
注解2：
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
根据每台worker上分配的cores来确定本节点上需要启动的executor的个数。即有可能在某个节点启动2个或多个executors

例：3个workers.1台剩余10cores.2台分别剩余3cores. 命令：--executor-cores 2  total-executor-cores 8

这种情况就是分别遍历每个worker，第一轮遍历分配后每个worker都符合启动一个executor，总共分配了6cores<total-executor-cores 8,则继续遍历可用资源，于是乎1号worker还剩8个cores，继续分配2个cores.最终达到标准total-executor-cores 8的要求。这样在这个worker上就有启动2个executor+另外2个worker上分别有1个executor=共4个executors。

先到这里吧！

0 0