0 votes
in PySpark by
What do you understand by "joins" in PySpark DataFrame? What are the different types of joins available in PySpark?

1 Answer

0 votes
by

In PySpark, joins merge or join two DataFrames together. It facilitates us to link two or multiple DataFrames together.

INNER Join, LEFT OUTER Join, RIGHT OUTER Join, LEFT ANTI Join, LEFT SEMI Join, CROSS Join, and SELF Join are among the SQL join types PySpark supports. Following is the syntax of PySpark Join.

Syntax:

join(self, other, on=None, how=None)  

Parameter Explanation:

The join() procedure accepts the following parameters and returns a DataFrame:

  • "other": It specifies the join's right side.
  • "on": It specifies the join column's name.
  • "how": It is used to specify an option. Options are inner, cross, outer, full, full outer, left, left outer, right, right outer, left semi, and left anti. The default is inner.

Types of Join in PySpark DataFrame

Join String Equivalent SQL Join

inner INNER JOIN

outer, full, fullouter, full_outer FULL OUTER JOIN

left, leftouter, left_outer LEFT JOIN

right, rightouter, right_outer RIGHT JOIN

cross

anti, leftanti, left_anti

semi, leftsemi, left_semi

...