Apr 1, 2020 in Big Data | Hadoop
Cluster By

hive> select id, name from person cluster by id;

When this query is executed, it will provide results to multiple reducers. If you have a set of columns and you are performing sort by and distribute by, you can replace it by cluster by. It sends records from a certain key column to the same reducer to cluster it.

Related questions

0 votes
Jan 10, 2020 in Big Data | Hadoop