0 votes
in Hadoop by
What do you mean by shuffling and sorting in MapReduce?

2 Answers

0 votes
by

What do you mean by shuffling and sorting in MapReduce?

Shuffling and Sorting takes place after the completion of map task. Shuffle and sort phase in hadoop occurs simultaneously.

Shuffling- It is the process of transferring data from the mapper to reducer. i.e., the process by which the system sorts the key-value output of the map tasks and transfer it to the reducer.

So, shuffle phase is necessary for reducer, otherwise, they would not have any input. As shuffling can start even before the map phase has finished. So this saves some time and completes the task in lesser time.

Sorting- Mapper generate the intermediate key-value pair. Before starting of reducer, MapReduce framework sort these key-value pairs by the keys.

Sorting helps reducer to easily distinguish when a new reduce task should start. Thus saves time for the reducer.

Shuffling and sorting are not performed at all if you specify zero reducers (setNumReduceTasks(0)).

0 votes
by

Shuffling: Shuffling refers to the process of transferring data from Mapper to reducer. It is one of the mandatory operations for the reducers to continue or proceed with their jobs further as the shuffling process will be serving as an input for the reduced tasks.

Sorting: MapReduce contains the output key-value pairs that exist between the map and reduce phases (after the mapper) will be automatically sorted before moving to the reducer. The sorting feature is helpful in the programs that require sorting at some stages. It also helps in saving the programmer’s overall time.

Related questions

0 votes
asked Nov 7, 2020 in Hadoop by SakshiSharma
+1 vote
asked Jun 21, 2023 in Hadoop by sharadyadav1986
...