Spark学习笔记之-Spark-Standalone下driver和executor分配
来源:互联网 发布:php获取ip地理位置 编辑:程序博客网 时间:2024/06/08 00:12
看了看spark-standalone的资源分配过程,还是记录一下吧,久了回顾一下。
Standalone模式下存在的角色。
Client:客户端进程,负责提交作业到Master。
Master:Standalone模式中主控节点,负责接收Client提交的作业,管理Worker,并命令Worker启动Driver和Executor。
Worker:Standalone模式中slave节点上的守护进程,负责管理本节点的资源,定期向Master汇报心跳,接收Master的命令,启动Driver和Executor。
Driver: 一个Spark作业运行时包括一个Driver进程,也是作业的主进程,负责作业的解析、生成Stage并调度Task到Executor上。包括DAGScheduler,TaskScheduler。
Executor:即真正执行作业的地方,一个集群一般包含多个Executor,每个Executor接收Driver的命令Launch Task,一个Executor可以执行一到多个Task。
所谓资源就主要是指application在works上的mem,cores的计算分配
Application:带有自己需要的mem和cpu资源量,会在master里排队,最后被分发到worker上执行。app的启动是去各个worker遍历,获取可用的cpu,然后去各个worker launch executor。
Worker:每台slave起一个,默认或被设置cpu和mem数,并在内存里做加减维护资源剩余量。Worker同时负责拉起本地的executor backend,即执行进程。
Master:接受Worker、app的注册,为app执行资源分配。Master和Worker本质上都是一个带Actor的进程。
具体整个架构和过程觉得这个写的挺详细的:http://www.kuqin.com/shuoit/20150213/344838.html
此处主要说下资源分配吧(spark-1.4版本)
file:Master.scala中master接受application注册,分配driver和executors
case RegisterApplication(description) => { if (state == RecoveryState.STANDBY) { // ignore, don't send response } else { logInfo("Registering app " + description.name) val app = createApplication(description, sender) registerApplication(app) logInfo("Registered app " + description.name + " with ID " + app.id) persistenceEngine.addApplication(app) sender ! RegisteredApplication(app.id, masterUrl) schedule()//进入资源分配入口 } }
/** * Schedule the currently available resources among waiting apps. This method will be called * every time a new app joins or resource availability changes. */ private def schedule(): Unit = { if (state != RecoveryState.ALIVE) { return } // Drivers take strict precedence over executors val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers随机找出几个workers for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) { for (driver <- waitingDrivers) {//可能有多个drivers等待启动 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {//看一下这个worker上的资源是否符合要求,mem,cores launchDriver(worker, driver)//在此worker启动driver waitingDrivers -= driver } } } startExecutorsOnWorkers()//计算和分配executors入口 }
继续func:starExecutorsOnWorkers()
/** * Schedule and launch executors on workers */ private def startExecutorsOnWorkers(): Unit = { // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app // in the queue, then the second app, etc. for (app <- waitingApps if app.coresLeft > 0) { val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor // Filter out workers that don't have enough resources to launch an executor val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)//过滤正常运行中的workers .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&//此worker的内存是否符合app指定的内存 worker.coresFree >= coresPerExecutor.getOrElse(1))//此worker剩余核数是否符合app指定的要求,默认是1个core .sortBy(_.coresFree).reverse val assignedCores = Master.scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)//遍历各个worker,进行资源分配下面具体看 // Now that we've decided how many cores to allocate on each worker, let's allocate them for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) { allocateWorkerResourceToExecutors(//申请资源启动executor app, assignedCores(pos), coresPerExecutor, usableWorkers(pos)) } } }
研究下这里
val assignedCores = Master.scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
这里返回的是符合资源要求的worker和其具体分配多少核。具体看下scheduleExecutorsOnWorkers吧
def scheduleExecutorsOnWorkers( app: ApplicationInfo, usableWorkers: Array[WorkerInfo], spreadOutApps: Boolean): Array[Int] = { // If the number of cores per executor is not specified, then we can just schedule // 1 core at a time since we expect a single executor to be launched on each worker val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1) val memoryPerExecutor = app.desc.memoryPerExecutorMB val numUsable = usableWorkers.length val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker用于记录每个符合资源要求的worker能分配的核数 val assignedMemory = new Array[Int](numUsable) // Amount of memory to give to each worker var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)//这里有2种情况,1:app所需要的核数<workers上剩余的cores. 2:则相反。<span style="font-family: Arial, Helvetica, sans-serif;">详见注解1,详见注解1,详见注解1,重要事情来三遍!</span>
var freeWorkers = (0 until numUsable).toIndexedSeq def canLaunchExecutor(pos: Int): Boolean = { usableWorkers(pos).coresFree - assignedCores(pos) >= coresPerExecutor && usableWorkers(pos).memoryFree - assignedMemory(pos) >= memoryPerExecutor } while (coresToAssign >= coresPerExecutor && freeWorkers.nonEmpty) {//遍历每个worker进行分配,<span style="font-family: Arial, Helvetica, sans-serif;">coresToAssign是app还需多少要分配(或workers上还有多少可以分配)</span>
freeWorkers = freeWorkers.filter(canLaunchExecutor) freeWorkers.foreach { pos => var keepScheduling = true while (keepScheduling && canLaunchExecutor(pos) && coresToAssign >= coresPerExecutor) { coresToAssign -= coresPerExecutor assignedCores(pos) += coresPerExecutor // If cores per executor is not set, we are assigning 1 core at a time // without actually meaning to launch 1 executor for each core assigned if (app.desc.coresPerExecutor.isDefined) { assignedMemory(pos) += memoryPerExecutor } // Spreading out an application means spreading out its executors across as // many workers as possible. If we are not spreading out, then we should keep // scheduling executors on this worker until we use all of its resources. // Otherwise, just move on to the next worker. if (spreadOutApps) {<span style="font-family: Arial, Helvetica, sans-serif;">//这里是</span><span style="font-family: Arial, Helvetica, sans-serif;">spreadOutApps为true(默认)情况,尽量在多个worker上启动executor、</span> keepScheduling = false <span style="font-family: Arial, Helvetica, sans-serif;">//相反如果为false,则根据指定的app的总核数,尽量分配单个节点的可用cores.</span> } } } } assignedCores//这里返回的就是保持可分配的节点worker以及他能分配多少个cores供给app. }}
接下来就是根据计算出来的资源,开始启动吧:
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) { allocateWorkerResourceToExecutors(//申请资源启动executor app, assignedCores(pos), coresPerExecutor, usableWorkers(pos)) } }
--total-executor-cores NUM Total cores for all executors./** * Allocate a worker's resources to one or more executors. * @param app the info of the application which the executors belong to * @param assignedCores number of cores on this worker for this application * @param coresPerExecutor number of cores per executor * @param worker the worker info */ private def allocateWorkerResourceToExecutors( app: ApplicationInfo, assignedCores: Int, coresPerExecutor: Option[Int], worker: WorkerInfo): Unit = { // If the number of cores per executor is specified, we divide the cores assigned // to this worker evenly among the executors with no remainder. // Otherwise, we launch a single executor that grabs all the assignedCores on this worker. val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)//看下在此worker上启动executor的个数.详见注解2,详见注解2,详见注解2! val coresToAssign = coresPerExecutor.getOrElse(assignedCores) for (i <- 1 to numExecutors) { val exec = app.addExecutor(worker, coresToAssign) launchExecutor(worker, exec)//在worker上启动executor app.state = ApplicationState.RUNNING } }
上面就是大概的资源分配过程。下面看一下这些参数吧。
--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,or all available cores on the worker in standalone mode)
之前总搞不明白他俩之间如何搭配。。。先看下代码中的注解,然后看下几个例子
注解1:这里的
var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum),是取两者小者,有两种情况:
1.app所需要的cores少于workers上的cores,资源丰富啊,那好啊,挨个遍历worker最少分配指定核数--executor-cores,直到分配到了总数total-executor-cores为止。
例:5个workers.每个10cores. 命令:--executor-cores 2 total-executor-cores 6
那么就是每个workers都满足,挨个遍历,发现遍历到第三个时候总的cores达到6了,则停止,在3个workers上分别启动一个executor,共3个executors
ps:如果spreadOutApps= false了,那么直接在第一个worker上全部分完总共的6个cores了。即启动一个executor。
2.app所需要的cores大于workers上的cores,资源不足不够分配total-executor-cores这么多了,那么就尽量在满足最低--executor-cores个数的worker上启动executor吧。
例:5个workers.有2台分别剩余3cores.3台分别剩余1个cores. 命令:--executor-cores 2 total-executor-cores 30显然5个worker上一共2X3 + 3X1 = 9个 < 30个的期望数字。那么就是在2台满足最低coresPerExecutor=2cores的worker上分别启动executor,共2个executors
注解2:
根据每台worker上分配的cores来确定本节点上需要启动的executor的个数。即有可能在某个节点启动2个或多个executorsval numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
例:3个workers.1台剩余10cores.2台分别剩余3cores. 命令:--executor-cores 2 total-executor-cores 8
这种情况就是分别遍历每个worker,第一轮遍历分配后每个worker都符合启动一个executor,总共分配了6cores<total-executor-cores 8,则继续遍历可用资源,于是乎1号worker还剩8个cores,继续分配2个cores.最终达到标准total-executor-cores 8的要求。这样在这个worker上就有启动2个executor+另外2个worker上分别有1个executor=共4个executors。
先到这里吧!
- Spark学习笔记之-Spark-Standalone下driver和executor分配
- spark学习-driver和executor调度
- Spark源码分析之Driver的分配启动和executor的分配启动
- spark学习-42-Spark的driver理解和executor理解
- Spark学习笔记之-Spark Standalone(环境搭建)
- Spark源码分析之worker节点启动driver和executor
- Standalone模式下Spark任务资源分配
- spark的Driver节点和Executor节点
- Spark笔记二之Standalone模式
- Spark学习之13:Standalone HA
- spark学习八 standalone模式下spark应用执行过程
- Spark学习之(二)Spark 集群环境搭建(standalone)
- spark中的动态executor分配
- Spark Executor Driver资源调度小结
- Spark Executor Driver资源调度小结
- Spark Executor Driver资源调度小结
- Spark Executor Driver资源调度小结
- spark Standalone
- *HDU 4427 - Math Magic(dp)
- ReactiveCocoa与Functional Reactive Programming
- js之location对象
- 二叉树的遍历-递归与非递归 - 海子
- 使用oracle遇到的相关函数
- Spark学习笔记之-Spark-Standalone下driver和executor分配
- git学习总结(三)
- 说说ReactiveCocoa 2
- ANSI-X99MAC算法和PBOC的3DES MAC算法,附DES算法工具
- ubuntu下配置opencv2.4.10 步骤以及一些所遇问题解决方法
- bash shell基础
- 百度地图自定义气泡
- linux awk 内置函数详细介绍(实例)
- phpstorm两个注册码