What is the difference between repartition and coalesce?

Question

What is the difference between repartition and coalesce?

1 Answer

rajeshsharma · Answer 1 · 2024-09-14T23:26:41+0000

Repartition: This method increases or decreases the number of partitions in an RDD, DataFrame, or Dataset. It involves a full shuffle of the data, which is costly in terms of performance because it redistributes data across the cluster.

Coalesce: This method decreases the number of partitions in an RDD, DataFrame, or Dataset. It avoids a full shuffle by attempting to combine existing partitions, making it more efficient than repartition when reducing the number of partitions.

What is the difference between repartition and coalesce?

Please log in or register to answer this question.

1 Answer