0 votes
in Hadoop by

What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues of MapReduce v1?

1 Answer

0 votes
by

In Hadoop v1,  MapReduce performed both data processing and resource management; there was only one master process for the processing layer known as JobTracker. JobTracker was responsible for resource tracking and job scheduling. 

Managing jobs using a single JobTracker and utilization of computational resources was inefficient in MapReduce 1. As a result, JobTracker was overburdened due to handling, job scheduling, and resource management. Some of the issues were scalability, availability issue, and resource utilization. In addition to these issues, the other problem was that non-MapReduce jobs couldn’t run in v1.

To overcome this issue, Hadoop 2 introduced YARN as the processing layer. In YARN, there is a processing master called ResourceManager. In Hadoop v2, you have ResourceManager running in high availability mode. There are node managers running on multiple machines, and a temporary daemon called application master. Here, the ResourceManager is only handling the client connections and taking care of tracking the resources. 

In Hadoop v2, the following features are available:

Scalability - You can have a cluster size of more than 10,000 nodes and you can run more than 100,000 concurrent tasks. 

Compatibility - The applications developed for Hadoop v1 run on YARN without any disruption or availability issues.

Resource utilization - YARN allows the dynamic allocation of cluster resources to improve resource utilization.

Multitenancy - YARN can use open-source and proprietary data access engines, as well as perform real-time analysis and run ad-hoc queries.

Related questions

0 votes
asked Sep 7, 2019 in Big Data | Hadoop by john ganales
0 votes
asked Sep 7, 2019 in Big Data | Hadoop by john ganales
...