HDFS provides fault tolerance by replicating the data blocks and distributing it among different DataNodes across the cluster. By default, this replication factor is set to 3 which is configurable. So, if I store a file of 1 GB in HDFS where the replication factor is set to default i.e. 3, it will finally occupy a total space of 3 GB because of the replication. Now, even if a DataNode fails or a data block gets corrupted, I can retrieve the data from other replicas stored in different DataNodes.