Some of the important points for
selecting a DataNode by NameNode are as follows:
1. NameNode tries to keep at least one replica of a Block on the same node that is writing the block.
2. It tries to spread the different replicas of same block on different racks, so that in case of one rack failure, other rack has the data.
3. One replica will be kept on a node on the same node as the one that it writing it. It is different from point 1. In Point 1, block is written to same node. In this point block is written on a different node on same rack. This is important for minimizing the network I/O.
NameNode also tries to spread the blocks uniformly among all the DataNodes in a cluster.