0 votes
in HDFS by

How to configure the split value?

1 Answer

0 votes
by
By default block size = 64mb, but to process the data, job tracker split the data. Hadoop architect use these formulas to know split size.

split size = min (max_splitsize, max (block_size, min_split_size));

split size = max(min_split_size, min (block_size, max_split, size));

by default split size = block size. Always No of splits = No of mappers. Apply above formula:

split size = Min (max_splitsize, max (64, 512kB) // max _splitsize = depends on env, may 1gb or 10gb split size = min (10gb (let assume), 64) split size = 64MB.

split size = max(min_split_size, min (block_size, max_split, size)); split size = max (512kb, min (64, 10GB)); split size = max (512kb, 64);split size = 64 MB;
...