+1 vote
in Hadoop by
Why is block size set to 128 MB in Hadoop HDFS?

1 Answer

0 votes
by

Why is block size set to 128 MB in Hadoop HDFS?

Block is a continuous location on the hard drive which stores the data. In general, FileSystem store data as a collection of blocks. HDFS stores each file as blocks, and distributes it across the Hadoop cluster. In HDFS, the default size of data block is 128 MB, which we can configure as per our requirement. Block size is set to 128 MB:

To reduce the disk seeks (IO). Larger the block size, lesser the file blocks and less number of disk seek and transfer of the block can be done within respectable limits and that to parallelly.

HDFS have huge data sets, i.e. terabytes and petabytes of data. If we take 4 KB block size for HDFS, just like Linux file system, which have 4 KB block size, then we would be having too many blocks and therefore too much of metadata. Managing this huge number of blocks and metadata will create huge overhead and traffic which is something which we don’t want. So, the block size is set to 128 MB.

On the other hand, block size can’t be so large that the system is waiting a very long time for the last unit of data processing to finish its work.

Related questions

0 votes
asked Jan 11, 2020 in Big Data | Hadoop by rajeshsharma
0 votes
asked Feb 17, 2023 in Hadoop by sharadyadav1986
...