+2 votes
in Hadoop by
What is meant by over/under- replicated blocks in Hadoop?

1 Answer

0 votes
by

The NameNode is responsible for ensuring that the number of copies of the data across the cluster is equal to the replication factor.

In some cases, maybe due to a failure in one of the nodes, the number of copies of the data is less than the replication factor. In such cases, the block is said to be under-replicated. The nodes are required to send updates to the NameNode regarding their health. In such cases, if the NameNode does not receive any updates from a particular node, it will ensure that the replication factor for a block is reached by starting re-replication of the blocks from the available nodes onto a new node.

Blocks are said to be over-replicated In cases where the number of copies of the data exceeds the replication factor. The name node fixed this issue by automatically deleting the extra copies of the blocks. Over-replication may occur in cases when after the shutdown of one particular node, the NameNode starts re-replication of data across new nodes, following which the node which was previously not available is restored.

Related questions

0 votes
asked Jun 22, 2023 in Hadoop by rajeshsharma
0 votes
asked Oct 28, 2020 in Hadoop by rahuljain1
...