You cannot change the no of Mappers via new Java API, because we are using Job class in MapReduceconfiguration core. In old API(deprecated), we can set no of mappers using setNumMapTasks(int n) methods via the JobConf object. Ideally, this is not the best way to set/change the no of mappers.
By default, no of mappers are 2 on each slave-node. We can set/change this value using mapreduce.tasktracker.map.tasks.maximum parameter. You need to set this parameter in mapred-site.xml file. We should not directly select random value to set the no of mappers.
Ideally for each logical InputSplit, a independent mapper or map dynamic container will get invoked. If we go with default case, on each particular slave node, Node-manager can run only two mappers or map dynamic containers parallely irrespective of logical input splits. Initially two input split are assigned to two map dynamic containers on slave-node1. then the remaining input split might be in a queue. In some cases, these input splits might got traveled to some otherslave-node(Let’s say SN2) which is having map dynamic container sitting idle. This mapper can process the traveled input-split on this slave-node (SN2).
Even though if you specified 2 value(No of mappers) in configuration file. Node-manager doesn’t invoke all mappers parallely. This decision is taken care by Resource Manager based on the input split(s) available on a particular slave-node. But that slave-node can run maximum 2 map dynamic container parallely.
Please go through below one, so that you can come to know how many no of maximum mapper we need to set in order to get optimize solution on a particular slave node.
When you are setting up the cluster, at that time you should decide how many maximum no of mappers that should be configured/run parallely on all slaves-nodes. Basically, no of mappers are decided based on the below two factors, that is,
1) No of cores
2) Ram memory
Lets say we have 10 cores on your system. we can have 10 mappers(One mapper = one core) if go with one core for each mapper. Each mapper/map dynamic container can run on one core. This case might not be true in all cases.