What do you mean by shuffling and sorting in MapReduce?

Question

What do you mean by shuffling and sorting in MapReduce?

2 Answers

sharadyadav1986 · Answer 1 · 2020-11-08T04:20:41+0000

Shuffling and Sorting takes place after the completion of map task. Shuffle and sort phase in hadoop occurs simultaneously.

Shuffling- It is the process of transferring data from the mapper to reducer. i.e., the process by which the system sorts the key-value output of the map tasks and transfer it to the reducer.

So, shuffle phase is necessary for reducer, otherwise, they would not have any input. As shuffling can start even before the map phase has finished. So this saves some time and completes the task in lesser time.

Sorting- Mapper generate the intermediate key-value pair. Before starting of reducer, MapReduce framework sort these key-value pairs by the keys.

Sorting helps reducer to easily distinguish when a new reduce task should start. Thus saves time for the reducer.

Shuffling and sorting are not performed at all if you specify zero reducers (setNumReduceTasks(0)).

sharadyadav1986 · Answer 2 · 2023-02-17T13:33:56+0000

Shuffling: Shuffling refers to the process of transferring data from Mapper to reducer. It is one of the mandatory operations for the reducers to continue or proceed with their jobs further as the shuffling process will be serving as an input for the reduced tasks.

Sorting: MapReduce contains the output key-value pairs that exist between the map and reduce phases (after the mapper) will be automatically sorted before moving to the reducer. The sorting feature is helpful in the programs that require sorting at some stages. It also helps in saving the programmer’s overall time.

What do you mean by shuffling and sorting in MapReduce?

Please log in or register to answer this question.

2 Answers

Related questions

Top Trending Technologies Questions and Answers

HOT LINKS

TRANDING TECHNOLOGIES

CONTACT US

Follow us on Social Media