What do you mean by shuffling and sorting in MapReduce?
Shuffling and Sorting takes place after the completion of map task. Shuffle and sort phase in hadoop occurs simultaneously.
Shuffling- It is the process of transferring data from the mapper to reducer. i.e., the process by which the system sorts the key-value output of the map tasks and transfer it to the reducer.
So, shuffle phase is necessary for reducer, otherwise, they would not have any input. As shuffling can start even before the map phase has finished. So this saves some time and completes the task in lesser time.
Sorting- Mapper generate the intermediate key-value pair. Before starting of reducer, MapReduce framework sort these key-value pairs by the keys.
Sorting helps reducer to easily distinguish when a new reduce task should start. Thus saves time for the reducer.
Shuffling and sorting are not performed at all if you specify zero reducers (setNumReduceTasks(0)).