+1 vote
in DevOps by
Difference between sort by or order by clause in Hive? Which is the fast?

1 Answer

0 votes
by

ORDER BY – sort the data in one reducer. Sort by much faster than order by.

SORT BY – sort the data within each reducer. You can use n number of reducers for sort.

In the first case (order by) maps sends each value to the single reducer and count them all.

In the second case (sort by) maps splits up the values to many reducers and each reduce generates its list and finds the count. So it can sort quickly.

Example:

SELECT name, id, cell FROM user_table ORDER BY id, name;

SELECT name, id, cell FROM user_table DISTRIBUTE BY id SORT BY name;

Related questions

0 votes
asked Jan 3 in MariaDB by rajeshsharma
0 votes
0 votes
asked Apr 1, 2020 in Big Data | Hadoop by AdilsonLima
...