How many Reducers run for a MapReduce job in Hadoop?
Reducer takes a set of an intermediate key-value pair produced by the mapper as the input. Then runs a reduce function on each of them to generate the output. Thus, the output of the reducer is the final output, which it stores in HDFS. Usually, in the reducer, we do aggregation or summation sort of computation.
With the help of Job.setNumreduceTasks(int) the user set the number of reducers for the job. Hence the right number of reducers are set by the formula:
0.95 Or 1.75 multiplied by (<no. of nodes> * <no. of the maximum container per node>).
With 0.95, all the reducers can launch immediately and start transferring map outputs as the map finish.
With 1.75, faster node finishes the first round of reduces and then launch the second wave of reduces.
By increasing the number of reducers:
Framework overhead increases
Increases load balancing
Lowers the cost of failures