Rack Awareness algorithm in Hadoop ensures that all the block replicas are not stored on the same rack or a single rack. Considering the replication factor is 3, the Rack Awareness Algorithm says that the first replica of a block will be stored on a local rack and the next two replicas will be stored on a different (remote) rack but, on a different DataNode within that (remote) rack. There are two reasons for using Rack Awareness:
To improve the network performance: In general, you will find greater network bandwidth between machines in the same rack than the machines residing in different rack. So, the Rack Awareness helps to reduce write traffic in between different racks and thus provides a better write performance.
To prevent loss of data: I don’t have to worry about the data even if an entire rack fails because of the switch failure or power failure. And if one thinks about it, it will make sense, as it is said that never put all your eggs in the same basket.