0 votes
in HDFS by

Replication causes data redundancy and consume a lot of space, then why is it pursued in HDFS?

1 Answer

0 votes
by
Replication is pursued in HDFS to provide the fault tolerance. And, yes, it will lead to the consumption of a lot of space, but one can always add more nodes to the cluster if required. By the way, in practical clusters, it is very rare to have free space issues as the very first reason to deploy HDFS was to store huge data sets. Also, one can change the replication factor to save HDFS space or use different codec provided by Hadoop to compress the data.

Related questions

+1 vote
asked Jun 8, 2020 in HDFS by Robindeniel
0 votes
asked Dec 21, 2022 in HDFS by Robin
...