Map Side Join in Hive

asked Apr 24, 2020 in Big Data | Hadoop by Hodge

Map-Side Join

Hive allows loading a table into memory to perform a join within mapper without using reduce.

If the table is small to fit in memory, you can use map-side joins.

This join is performed by importing small table in memory and therefore speeds up query execution.

hive> select /*+ MAPJOIN(product) */ sales.*,product.*
    > from sales JOIN product ON (sales.id=product.id);

OUTPUT:-
John     5    5    Shoes
Cena     2    2    Coat
Angle     3    3    Pencil
Raffle     4    4    Shirt

Map joins can be used with bucketed tables also. However, for that, you need to set the property as follow:

set hive.optimize.bucketmapjoin=true;

...