Skip to content

50.043 Yarn

Learning Outcomes

By the end of this lesson, you are able to

  • Articulate the purpose of Hadoop YARN
  • Identify the shortcoming of JobTracker and TaskTracker in Hadoop V1
  • Describe how YARN addresses the above issues
  • List and differentiate the different YARN schedulers

Hadoop and Eco System

We have many tools in the big data tool box

  • Hadoop HDFS - Main Storage
  • Hadoop MapReduce - Low CPU / Low RAM consumption batch job
  • Spark Job - High CPU / GPU / RAM batch job
  • Spark Stream - Realtime stream
  • ...

Spark and Hadoop v1 have incompatible resource managers, they don't understand each other. It is crictical to have a unified resource management tool for all kinds of jobs.

Hadoop v1 Resource Management System

Hadoop v1 Resource Management System has many issues.

The namenode acts as the master node and the job tracker. It is overloaded with * Resource Management * Scheduling * Job monitoring * Job lifecycle * Fault-tolerence

It soon becomes the bottle neck of the system. It does not scale more than 4000 nodes and 40,000 tasks.

The datanodes acts as the taskers. They have static task slots, for instance, 2 slots for mapper task and 2 slots for reducer task. The mapper slots cannot be used for reducer tasks. This leads to additional constraints in task schedules and limitation for load balancing.

YARN

YARN is a general resource management system shipped with Hadoop V2 on wards. It is designed with Hadoop and its eco system. It overcomes most of the limitations found in Hadoop V1. It provides an extensible APIs for non-Hadoop tasks.

Hadoop V1 vs YARN

In the following diagram we find the correspondence between Hadoop V1 resource manager and YARN

Correspondent between Hadoop V1 and Hadoop V2 Resource Mgmt

On the topmost leve, the JobTracker is replaced by an application master which in charge of managing an application running on the cluster. The Application master it is not running on the namenode, instead it runs in one of the task slots in a worker node. The resource managemer which coordinates among the hardware resources and many application masters and node managers, is running on namenode. This addresses the "jack-of-all-trades" problem with Hadoop V1 namenode.

The task trackers in v1 are replaced by node managers. Each worker node has a node manager running. The node manager manages the resource and task status for the worker node.

The fixed number of mapper slots and reducer slots are replaced by a container in the worker node, which has the flexibility of running any task (mapper, reducer, application master, and etc), scheduled by the scheduler.

Yarn Job submission work flow

When a client needs to submit an application to YARN, the following steps are taken.

  1. Client submits an application
  2. Application Manager (RM) allocates a container to start Application Master
  3. Application Master registers with Resource Manager
  4. Application Master asks containers from Resource Manager
  5. Application Master notifies Node Manager to launch containers
  6. Application code is executed in the container
  7. Client contacts Application Master to monitor application status
  8. Application Master unregisters with Resource Manager

YARN Scheduler

Next we consider how YARN schedule jobs.

Job scheduling is tough multiple objective optimization problem. It often needs to meet all or most of the following requirement

  • It needs to offer the capacity guarantee to the cluster users
  • It needs to fulfill the service level agreement, if the cluster is in subscription model
  • It needs to be fairn.
  • It needs to ensure the utilization level of the cluster
  • ...

Things get trickier when the applicaiton masters are allowed to request for resource up-front statically or dynamically during the execution.

YARN is shipped with a few scheduler templates, namely * FIFO - Single queue, first in first out * Capacity - optimize for capacity and resources by different queues * FairScheduler - similar to Capacity, but allows applications to move between queues during execution

Besides these, users are allowed to define their own scheduler policy for their own clusters.

However to ensure faireness in general is a hard problem too. One of the popular approach is to define a standard metric for multi-resource request for all the incoming application so that an order can be fixed. For instance Dominant Resource Fairness.

https://cs.stanford.edu/~matei/papers/2011/nsdi_drf.pdf