Jan 11, 2020 in Big Data | Hadoop
Q: What is the meaning of term Data Locality in Hadoop?

1 Answer

0 votes
Jan 11, 2020

In a Big Data system, the size of data is huge. So it does not make sense to move data across the network. In such a scenario, Hadoop tries to move computation closer to data.

So the Data remains local to the location wherever it was stored. But the computation tasks will be moved to data nodes that hold the data locally.


Hadoop follows following rules for Data Locality optimization:

1. Hadoop first tries to schedule the task on node that has an HDFS file on a local disk.

2. If it cannot be done, then Hadoop will try to schedule the task on a node on the same rack as the node that has data.

3. If this also cannot be done, Hadoop will schedule the task on the node with same data on a different rack.

The above method works well, when we work with the default replication factor


of 3 in Hadoop.

Related questions

0 votes
Jul 1, 2020 in Cache Technique
+1 vote
Feb 23, 2020 in Big Data | Hadoop