0 votes
in Apache Spark by
Explain what RDD is?

1 Answer

0 votes
by
RDD stands for Resilient Distributed Dataset. Spark's fundamental data structure represents an immutable, distributed collection of objects that can be processed in parallel. RDDs can contain any type of Python, Java, or Scala objects. They are fault-tolerant, as they track the lineage of transformations applied to them, allowing lost data to be recomputed. RDDs support two types of operations: transformations (which create a new RDD) and actions (which return a value to the driver program).
...