How does partitioning work in Hadoop?

Question

How does partitioning work in Hadoop?

1 Answer

SakshiSharma · Answer 1 · 2020-01-11T06:03:33+0000

Partitioning is the phase between Map phase and Reduce phase in Hadoop workflow. Since partitioner gives output to Reducer, the number of partitions is same as the number of Reducers.

Partitioner will partition the output from Map phase into distinct partitions by using a user-defined condition.

Partitions can be like Hash based buckets.

E.g. If we have to find the student with the maximum marks in each gender in each subject. We can first use Map function to map the keys with each gender. Once mapping is done, the result is passed to Partitioner. Partitioner will partition each row with gender on the basis of subject. For each subject there will be a different Reducer. Reducer will take input from each partition and find the student with the highest marks.

How does partitioning work in Hadoop?

Please log in or register to answer this question.

1 Answer