What is Rack Awareness?
Rack Awareness improves the network traffic while reading/writing file. In which NameNode chooses the DataNode which is closer to the same rack or nearby rack. NameNode achieves rack information by maintaining the rack IDs of each DataNode. This concept that chooses Datanodes based on the rack information. In HDFS, NameNode makes sure that all the replicas are not stored on the same rack or single rack. It follows Rack Awareness Algorithm to reduce latency as well as fault tolerance.
Default replication factor is 3, according to Rack Awareness Algorithm. Therefore, the first replica of the block will store on a local rack. The next replica will store on another datanode within the same rack. And the third replica stored on the different rack.
In Hadoop, we need Rack Awareness because it improves:
Data high availability and reliability.
The performance of the cluster.