0 votes
in Big Data | Hadoop by
How does partitioning work in Hadoop?

1 Answer

0 votes
by

Partitioning is the phase between Map phase and Reduce phase in Hadoop workflow. Since partitioner gives output to Reducer, the number of partitions is same as the number of Reducers.

Partitioner will partition the output from Map phase into distinct partitions by using a user-defined condition.

Partitions can be like Hash based buckets.

 

E.g. If we have to find the student with the maximum marks in each gender in each subject. We can first use Map function to map the keys with each gender. Once mapping is done, the result is passed to Partitioner. Partitioner will partition each row with gender on the basis of subject. For each subject there will be a different Reducer. Reducer will take input from each partition and find the student with the highest marks.

Related questions

0 votes
asked Apr 24, 2020 in Big Data | Hadoop by Hodge
0 votes
asked Sep 7, 2019 in Big Data | Hadoop by john ganales
...