Home
Recent Q&A
Java
Cloud
JavaScript
Python
SQL
PHP
HTML
C++
Data Science
DBMS
Devops
Hadoop
Machine Learning
Azure
Blockchain
Devops
Ask a Question
What are the key advantages of PySpark RDD?
Home
PySpark
What are the key advantages of PySpark RDD?
0
votes
asked
Mar 13, 2022
in
PySpark
by
rajeshsharma
What are the key advantages of PySpark RDD?
pyspark-rdd
Please
log in
or
register
to answer this question.
1
Answer
0
votes
answered
Mar 13, 2022
by
rajeshsharma
Following is the list of key advantages of PySpark RDD:
Immutability: The PySpark RDDs are immutable. If you create them once, you cannot modify them later. You have to create a new RDD whenever you try to apply any transformation operations on the RDDs.
Fault Tolerance: The PySpark RDD provides fault tolerance features. Whenever an operation fails, the data gets automatically reloaded from other available partitions. This provides a seamless experience of execution of the PySpark applications.
Partitioning: When we create an RDD from any data, the elements in the RDD are partitioned to the cores available by default.
Lazy Evolution: PySpark RDD follows the lazy evolution process. In PySpark RDD, the transformation operations are not performed as soon as they are encountered. The operations would be stored in the DAG and are evaluated once it finds the first RDD action.
In-Memory Processing: The PySpark RDD is used to help in loading data from the disk to the memory. You can persist RDDs in the memory for reusing the computations.
...