Distribute By SQL command used in Hive

Distribute By

When we have a large set of data, it is preferable to use sort as it uses more than one reducers.

When records of a particular category appear in all the output files (it is not the duplicate data, the output is being distributed between the reducers and then sorted in each reducer, which is not ideal). So, when you want all the records of the same category to be sorted in one file, then use DISTRIBUTE BY.

All columns to distribute by will be sent to the same reducer.

hive> select id, name from person distribute by id;

Distribute By SQL command used in Hive

Distribute By

Please log in or register to answer this question.

0 Answers

Related questions

Top Trending Technologies Questions and Answers

HOT LINKS

TRANDING TECHNOLOGIES

CONTACT US

Follow us on Social Media