+1 vote
in Hadoop by
How many Mappers run for a MapReduce job in Hadoop?

1 Answer

0 votes
by

How many Mappers run for a MapReduce job in Hadoop?

Mapper task processes each input record (from RecordReader) and generates a key-value pair. The number of mappers depends on 2 factors:

The amount of data we want to process along with block size. It depends on the number of InputSplit. If we have the block size of 128 MB and we expect 10TB of input data, thus we will have 82,000 maps. Ultimately InputFormat determines the number of maps.

The configuration of the slave i.e. number of core and RAM available on the slave. The right number of map/node can between 10-100. Hadoop framework should give 1 to 1.5 cores of the processor to each mapper. Thus, for a 15 core processor, 10 mappers can run.

In MapReduce job, by changing the block size one can control the number of Mappers. Hence, by Changing block size the number of InputSplit increases or decreases.

By using the JobConf’s conf.setNumMapTasks(int num) one can increase the number of map tasks manually.

Mapper= {(total data size)/ (input split size)}

If data size= 1 Tb and input split size= 100 MB

Hence, Mapper= (1000*1000)/100= 10,000

Related questions

0 votes
asked Jan 8, 2020 in Big Data | Hadoop by GeorgeBell
0 votes
asked Jun 18, 2023 in HDFS by Robindeniel
...