in Spark Sql by
Q:
Difference between RDD and DataFrame in Spark?

1 Answer

0 votes
by

RDD:-

Optimization – No inbuilt optimization engine is available in RDD

Serialization- it does so use Java serialization

Compile-time type safety

Efficiently process data, which is structured as well as unstructured

Need to define the schema (manually)

RDD API is slower to perform simple grouping and aggregation operations

DataFrame :-

Optimization- Optimization takes place using catalyst optimizer, Analyzing a logical plan, Logical plan, Physical planning and Code generation to compile java bytecode

Serialization– it uses off-heap storage (in memory) in binary format

Run-time type validation

Efficiently process data, which is structured as well as semi-structured

Shema is automatically defined

DataFrame API is slower to perform simple grouping and aggregation operations

Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

0 votes
asked Mar 9, 2020 in Spark Sql by SakshiSharma
+1 vote
asked Mar 9, 2020 in Spark Sql by SakshiSharma
0 votes
asked Mar 7, 2020 in Spark Sql by rahuljain1
+1 vote
asked Jun 30, 2019 in Spark Sql by anonymous
+1 vote
asked Jun 30, 2019 in Spark Sql by anonymous
...