In PySpark, joins merge or join two DataFrames together. It facilitates us to link two or multiple DataFrames together.
INNER Join, LEFT OUTER Join, RIGHT OUTER Join, LEFT ANTI Join, LEFT SEMI Join, CROSS Join, and SELF Join are among the SQL join types PySpark supports. Following is the syntax of PySpark Join.
Syntax:
join(self, other, on=None, how=None)
Parameter Explanation:
The join() procedure accepts the following parameters and returns a DataFrame:
- "other": It specifies the join's right side.
- "on": It specifies the join column's name.
- "how": It is used to specify an option. Options are inner, cross, outer, full, full outer, left, left outer, right, right outer, left semi, and left anti. The default is inner.
Types of Join in PySpark DataFrame
Join String Equivalent SQL Join
inner INNER JOIN
outer, full, fullouter, full_outer FULL OUTER JOIN
left, leftouter, left_outer LEFT JOIN
right, rightouter, right_outer RIGHT JOIN
cross
anti, leftanti, left_anti
semi, leftsemi, left_semi