Resilient Distribution Datasets (RDD) is a fault-tolerant collection of partitioned data that run in parallel. RDD is immutable and distributed in nature.