0 votes
in Spark Sql by
Difference between RDD and DataFrame in Spark?

1 Answer

0 votes
by

RDD:-

Optimization – No inbuilt optimization engine is available in RDD

Serialization- it does so use Java serialization

Compile-time type safety

Efficiently process data, which is structured as well as unstructured

Need to define the schema (manually)

RDD API is slower to perform simple grouping and aggregation operations

DataFrame :-

Optimization- Optimization takes place using catalyst optimizer, Analyzing a logical plan, Logical plan, Physical planning and Code generation to compile java bytecode

Serialization– it uses off-heap storage (in memory) in binary format

Run-time type validation

Efficiently process data, which is structured as well as semi-structured

Shema is automatically defined

DataFrame API is slower to perform simple grouping and aggregation operations

Related questions

0 votes
asked Mar 14, 2020 in Spark Sql by rajeshsharma
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma
...