Feb 23 in Big Data | Hadoop

Why do we use HDFS for applications having large data sets and not when there are lot of small files?

1 Answer

Feb 23

HDFS is more suitable for large amount of data sets in a single file as compared to small amount of data spread across multiple files. This is because Namenode is a very expensive high performance system, so it is not prudent to occupy the space in the Namenode by unnecessary amount of metadata that is generated for multiple small files. So, when there is a large amount of data in a single file, name node will occupy less space. Hence for getting optimized performance, HDFS supports large data sets instead of multiple small files.

Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

Jun 8 in HDFS
Dec 23, 2019 in Agile
Apr 8 in SAP