What is YARN
Yet Another Resource Manager takes programming to the next level beyond Java , and makes it interactive to let another application Hbase, Spark etc. to work on it.Different Yarn applications can co-exist on the same cluster so MapReduce, Hbase, Spark all can run at the same time bringing great benefits for manageability and cluster utilization.
Components Of YARN
Client: For submitting MapReduce jobs.
Resource Manager: To manage the use of resources across the cluster
Node Manager:For launching and monitoring the computer containers on machines in the cluster.
Map Reduce Application Master: Checks tasks running the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers.
Jobtracker & Tasktrackerwere were used in previous version of Hadoop, which were responsible for handling resources and checking progress management. However, Hadoop 2.0 has Resource manager and NodeManager to overcome the shortfall of Jobtracker & Tasktracker.
Benefits of YARN
Scalability: Map Reduce 1 hits ascalability bottleneck at 4000 nodes and 40000 task, but Yarn is designed for 10,000 nodes and 1 lakh tasks.
Utiliazation: Node Manager manages a pool of resources, rather than a fixed number of the designated slots thus increasing the utilization.
Multitenancy: Different version of MapReduce can run on YARN, which makes the process of upgrading MapReduce more manageable.