• Home
  • Recent Q&A
  • Java
  • Cloud
  • JavaScript
  • Python
  • SQL
  • PHP
  • HTML
  • C++
  • Data Science
  • DBMS
  • Devops
  • Hadoop
  • Machine Learning
in Hadoop by
Q:
Why HDFS performs replication, although it results in data redundancy?

1 Answer

0 votes
by

Why HDFS performs replication, although it results in data redundancy?

In HDFS, Replication provides the fault tolerance. Data replication is one of the most important and unique features of HDFS. Replication of data solves the problem of data loss in unfavorable conditions. Unfavorable conditions are crashing of the node, hardware failure and so on. HDFS by default creates 3 replicas of each block across the cluster in Hadoop. And we can change it as per the need. So, if any node goes down, we can recover data on that node from the other node.

In HDFS, Replication will lead to the consumption of a lot of space. But the user can always add more nodes to the cluster if required. It is very rare to have free space issues in the practical cluster. As the very first reason to deploy HDFS was to store huge data sets. Also, one can change the replication factor to save HDFS space. Or one can also use different codec provided by the Hadoop to compress the data.

Read: Namenode High Availability

Related questions

0 votes
asked Nov 24, 2020 in HDFS by rahuljain1
0 votes
asked Oct 28, 2020 in Hadoop by rahuljain1
...