Jan 11, 2020 in Big Data | Hadoop
Q: How does partitioning work in Hadoop?

1 Answer

0 votes
Jan 11, 2020

Partitioning is the phase between Map phase and Reduce phase in Hadoop workflow. Since partitioner gives output to Reducer, the number of partitions is same as the number of Reducers.

Partitioner will partition the output from Map phase into distinct partitions by using a user-defined condition.

Partitions can be like Hash based buckets.

 

E.g. If we have to find the student with the maximum marks in each gender in each subject. We can first use Map function to map the keys with each gender. Once mapping is done, the result is passed to Partitioner. Partitioner will partition each row with gender on the basis of subject. For each subject there will be a different Reducer. Reducer will take input from each partition and find the student with the highest marks.

Related questions

0 votes
Jan 12, 2020 in Big Data | Hadoop
+1 vote
Feb 8, 2020 in JAVA
...