0 votes
in HDFS by
Replication causes data redundancy then why is it still preferred in HDFS?

1 Answer

0 votes
by

As we know that Hadoop works on commodity hardware, so there is an increased probability of getting crashed. Thus to make the entire Hadoop system highly tolerant, replication factor is preferred even though it creates multiple copies of the same data at different locations. Data on HDFS is stored in at least 3 different locations. Whenever one copy of the data is corrupted and the other copy of the data is not available due to some technical glitches then the data can be accessed from the third location without any data loss.

Related questions

0 votes
asked Nov 7, 2020 in Hadoop by SakshiSharma
0 votes
asked Jan 11, 2020 in Big Data | Hadoop by rajeshsharma
...