Block scanner runs periodically on every DataNode to verify whether the data blocks stored are correct or not. The following steps will occur when a corrupted data block is detected by the block scanner:
First, the DataNode will report about the corrupted block to the NameNode.
Then, NameNode will start the process of creating a new replica using the correct replica of the corrupted block present in other DataNodes.
The corrupted data block will not be deleted until the replication count of the correct replicas matches with the replication factor (3 by default).
This whole process allows HDFS to maintain the integrity of the data when a client performs a read operation. One can check the block scanner report using the DataNode’s web interface- localhost:50075/blockScannerReport as shown below:
Block Scanner Report - Hadoop HDFS Interview Questions - Edureka
Fig. – Block Scanner Report – Hadoop HDFS Interview Question