0 votes
in Hadoop by
Explain Data Locality in Hadoop?

1 Answer

0 votes
by

Hadoop Interview Questions and Answers - Data Locality

Hadoop Interview Questions and Answers – Data Locality

Hadoop major drawback was cross-switch network traffic due to the huge volume of data. To overcome this drawback, Data locality came into the picture. It refers to the ability to move the computation close to where the actual data resides on the node, instead of moving large data to computation. Data locality increases the overall throughput of the system.

In Hadoop, HDFS stores datasets. Datasets are divided into blocks and stored across the datanodes in Hadoop cluster. When a user runs the MapReduce job then NameNode sends this MapReduce code to the datanodes on which data is available related to MapReduce job.

Data locality has three categories:

1) Data local – In this category data is on the same node as the mapper working on the data. In such case, the proximity of the data is closer to the computation. This is the most preferred scenario.

2) Intra – Rack- In this scenarios mapper run on the different node but on the same rack. As it is not always possible to execute the mapper on the same datanode due to constraints.

3) Inter-Rack – In this scenarios mapper run on the different rack. As it is not possible to execute mapper on a different node in the same rack due to resource constraints.

Related questions

0 votes
0 votes
asked Jul 1, 2020 in Cache Technique by Robindeniel
+1 vote
asked Oct 29, 2022 in Hadoop by SakshiSharma
...