What is shuffling in Apache Spark? When does it occur?

Question

What is shuffling in Apache Spark? When does it occur?

1 Answer

sharadyadav1986 · Answer 1 · 2022-03-29T02:19:44+0000

In Apache Spark, shuffling is the process of redistributing data across partitions that may lead to data movement across the executors. The implementation of shuffle operation is entirely different in Spark as compared to Hadoop.

Shuffling has two important compression parameters:

shuffle.compress: It is used to check whether the engine would compress shuffle outputs or not.
shuffle.spill.compress: It is used to decide whether to compress intermediate shuffle spill files or not.
Shuffling comes in the scene when we join two tables or perform byKey operations such as GroupByKey or ReduceByKey.

What is shuffling in Apache Spark? When does it occur?

Please log in or register to answer this question.

1 Answer

Related questions

Top Trending Technologies Questions and Answers

HOT LINKS

TRANDING TECHNOLOGIES

CONTACT US

Follow us on Social Media