What are the key advantages of PySpark RDD?

Question

1 Answer

rajeshsharma · Answer 1 · 2022-03-13T12:32:03+0000

Following is the list of key advantages of PySpark RDD:

Immutability: The PySpark RDDs are immutable. If you create them once, you cannot modify them later. You have to create a new RDD whenever you try to apply any transformation operations on the RDDs.
Fault Tolerance: The PySpark RDD provides fault tolerance features. Whenever an operation fails, the data gets automatically reloaded from other available partitions. This provides a seamless experience of execution of the PySpark applications.
Partitioning: When we create an RDD from any data, the elements in the RDD are partitioned to the cores available by default.
Lazy Evolution: PySpark RDD follows the lazy evolution process. In PySpark RDD, the transformation operations are not performed as soon as they are encountered. The operations would be stored in the DAG and are evaluated once it finds the first RDD action.
In-Memory Processing: The PySpark RDD is used to help in loading data from the disk to the memory. You can persist RDDs in the memory for reusing the computations.