Distribute By SQL command used in Hive

Question

Distribute By SQL command used in Hive

asked Apr 1, 2020 in Big Data | Hadoop by AdilsonLima

Distribute By

When we have a large set of data, it is preferable to use sort as it uses more than one reducers.

When records of a particular category appear in all the output files (it is not the duplicate data, the output is being distributed between the reducers and then sorted in each reducer, which is not ideal). So, when you want all the records of the same category to be sorted in one file, then use DISTRIBUTE BY.

All columns to distribute by will be sent to the same reducer.

hive> select id, name from person distribute by id;

Distribute By SQL command used in Hive

Distribute By

Please log in or register to answer this question.

0 Answers