0 votes
in Big Data | Hadoop by
What is the meaning of term Data Locality in Hadoop?

1 Answer

0 votes
by

In a Big Data system, the size of data is huge. So it does not make sense to move data across the network. In such a scenario, Hadoop tries to move computation closer to data.

So the Data remains local to the location wherever it was stored. But the computation tasks will be moved to data nodes that hold the data locally.

 

Hadoop follows following rules for Data Locality optimization:

1. Hadoop first tries to schedule the task on node that has an HDFS file on a local disk.

2. If it cannot be done, then Hadoop will try to schedule the task on a node on the same rack as the node that has data.

3. If this also cannot be done, Hadoop will schedule the task on the node with same data on a different rack.

The above method works well, when we work with the default replication factor

 

of 3 in Hadoop.

Related questions

+1 vote
asked Jan 29, 2022 in Big Data | Hadoop by sharadyadav1986
+1 vote
asked Feb 23, 2020 in Big Data | Hadoop by rahuljain1
...