RDD:-
Optimization – No inbuilt optimization engine is available in RDD
Serialization- it does so use Java serialization
Compile-time type safety
Efficiently process data, which is structured as well as unstructured
Need to define the schema (manually)
RDD API is slower to perform simple grouping and aggregation operations
DataFrame :-
Optimization- Optimization takes place using catalyst optimizer, Analyzing a logical plan, Logical plan, Physical planning and Code generation to compile java bytecode
Serialization– it uses off-heap storage (in memory) in binary format
Run-time type validation
Efficiently process data, which is structured as well as semi-structured
Shema is automatically defined
DataFrame API is slower to perform simple grouping and aggregation operations