创建Executor

来源:互联网 发布:办公平台软件 编辑:程序博客网 时间:2024/05/19 22:46

    当sparkContext被创建后,worker就会分配executor,这个过程如下图所示:



        如上图所示,executor要经过很多个步骤才会被创建。

  • SparkContext中有一个叫做createTaskScheduler()的函数,这个函数会根据master URL的类型,创建taskScheduler和相应的backend,其主要代码如下:

private defcreateTaskScheduler(
sc: SparkContext,
master: String): (SchedulerBackend, TaskScheduler) = {mastermatch {
case "local"=>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal =true)
val backend = new LocalBackend(sc.getConf, scheduler, 1)
scheduler.initialize(backend)
(backend, scheduler) case LOCAL_N_REGEX(threads) => ......
case LOCAL_N_FAILURES_REGEX(threads, maxFailures) => ......
case SPARK_REGEX(sparkUrl) => ......
case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) => ......
case "yarn-standalone"|"yarn-cluster" => ......
case "yarn-client"=> ......
case MESOS_REGEX(mesosUrl) => ......
case SIMR_REGEX(simrUrl) => ......
case zkUrl if zkUrl.startsWith("zk://") => ......
case _ => ......}

SparkContext会调用这个函数创建taskScheduler,随后start taskScheduler,代码如下:

val (sched, ts) = SparkContext.createTaskScheduler(this, master)_taskScheduler = ts_dagScheduler = new DAGScheduler(this)// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's_taskScheduler.start()

需要注意的是,tasksheduler 开始之前,一定要设置DAGScheduler引用,在DAGScheduler.scala中用这一行代码实现

taskScheduler.setDAGScheduler(this)
  • TaskScheduler中有一个start()方法,该方法会直接调用backend.start(),核心代码如下:
override def start() {  backend.start()  ......
}
这个backend是怎么来的呢?且看SparkContext中createTaskScheduler 函数的实现,这个backend是通过这句代码来的
scheduler.initialize(backend)
  • 接下来看看backend中的start函数做了哪些事情(注:不同部署模式的backend有可能不一样,以下代码来自SparkDeploySchedulerBackend),主要代码如下:
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",  args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory,  command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)client = new AppClient(sc.env.rpcEnv, masters, appDesc, this, conf)client.start()

首先定义并初始化了command变量,因为部署模式是local,传入的参数是“*CoarseGrainedExecutorBackend”,CoarseGrainedExecutorBackend在后面创建了executor。创建appDesc的时候需要传入command变量,而创建AppClient的时候又需要传入appDesc。最后启动了AppClient。

  • 这个AppClient是用来干嘛的呢?它是用来向Master注册Application。且看如下主要代码(源代码在AppClient.scala中):

private def tryRegisterAllMasters():
...
 masterRef.send(RegisterApplication(appDescription, self))
...
override def receive():
case RegisteredApplication(appId_, masterRef) => {...}
...

这个函数主要向Master发送注册Application的信息,在Master.scala的receive函数中会接收这个注册信息,所以真正创建Application的是Master,创建成功之后,master会把创建成功的信息传回给AppClient,AppClient.scala的receive函数会接收这个信息。创建application的代码如下:

override def receive():
...
case RegisterApplication(description, driver) => {  //接收到来自appClient发送的信息
...
val app = createApplication(description, driver)
 registerApplication(app) // 注册创建好的application
driver.send(RegisteredApplication(app.id, self))  //把消息发送给AppClient,该消息会被函数receive()接收
...}

随后Master会向worker发送创建executorRunner的请求:

private def launchExecutor:
 worker.endpoint.send(LaunchExecutor(...))

  • Worker接收到信息之后,会创建executorRunner:
override def receive:
case LaunchExecutor(...)=>{ val manager = new ExecutorRunner(...)}

  • executorRunner 会根据ApplicationDescription中的描述运行 executor:
private def fetchAndRunExecutor:
...
val builder = CommandUtils.buildProcessBuilder(appDesc.command,...)
val command = builder.command()
...


appDesc.command中的appDesc是哪里来的呢?command又是哪里来的呢?这两个变量都是在SparkDeploySchedulerBackend.scala(部署模式为local和standalone时)中的start()函数中创建的。正如前面提到的,在创建command时传入了参数org.apache.spark.executor.CoarseGrainedExecutorBackend,executorRunner会启动CoarseGrainedExecutorBackend。而CoarseGrainedExecutorBackend又会创建executor,主要代码如下:

override def onStart:
...
ref.ask[RegisterExecutorResponse](
        RegisterExecutor(executorId, self, hostPort, cores, extractLogUrls))
...


override def receive:  
   case RegisteredExecutor(hostname) =>
   executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
...


至此,executor已经被创建了。

1 0
原创粉丝点击