0 votes
in Apache Spark by
What is the use of coalesce in Spark?

1 Answer

0 votes
by

Spark uses a coalesce method to reduce the number of partitions in a DataFrame.

Suppose you want to read data from a CSV file into an RDD having four partitions.

partition

This is how a filter operation is performed to remove all the multiple of 10 from the data.

The RDD has some empty partitions. It makes sense to reduce the number of partitions, which can be achieved by using coalesce.

This is how the resultant RDD would look like after applying to coalesce.

...