Jan 11 in Big Data | Hadoop
Q: How does partitioning work in Hadoop?

1 Answer

Jan 11

Partitioning is the phase between Map phase and Reduce phase in Hadoop workflow. Since partitioner gives output to Reducer, the number of partitions is same as the number of Reducers.

Partitioner will partition the output from Map phase into distinct partitions by using a user-defined condition.

Partitions can be like Hash based buckets.


E.g. If we have to find the student with the maximum marks in each gender in each subject. We can first use Map function to map the keys with each gender. Once mapping is done, the result is passed to Partitioner. Partitioner will partition each row with gender on the basis of subject. For each subject there will be a different Reducer. Reducer will take input from each partition and find the student with the highest marks.

Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

Jan 12 in Big Data | Hadoop
Jan 7 in Big Data | Hadoop
Sep 7, 2019 in Big Data | Hadoop