In MapReduce algorithm, reducers are the second phase of processing, reducers is used for final summation and aggregation.
To understand reducer properly let’s take a simple example. Suppose we want to find out summation of salaries of employee data (data is provided in a CSV file) by their job titles (e.g. summation of salaries of developers, manager’s etc).
Typically in a MapReduce program, we will first create Key-Value pairs with available data. For our example, we will take Job titles as Key and Salaries as value. Many mappers will run to map this data based on custom business logic. But before this data is given to reducer, Shuffling-Sorting is done on this data. This is done internally. Then data is given to reducer. Usually, one reducer would be enough because data will come to reducer as (key, list of values i.e Job title, list of salaries e.g. Manager, (12000,13000,17000,13000….). So in the reducer, we have to just do an addition of all the values to get the summation of salaries by job titles.
We can configure the number of reducer’s required in driver class. In few cases like data parsing, we might not require reducer, so we can configure the number of reducer as zero. In such cases sorting and shuffling is also not done.