Categories

Jan 13 in Big Data | Hadoop

Q: What is a Parquet file in Spark?

1 Answer

Jan 13
Apache Parquet is a columnar storage format that is available to any project in Hadoop ecosystem. Any data processing framework, data model or programming language can use it.

It is a compressed, efficient and encoding format common to Hadoop system projects.

Spark SQL supports both reading

 

and writing of parquet files. Parquet files also automatically preserves the schema of the original data.

During write operations, by default all columns in a parquet file are converted to nullable column.
Click here to read more about Loan/Mortgage
Click here to read more about Insurance

Related questions

Madanswer
Jan 13 in Big Data | Hadoop
Mar 8 in Spark Sql
Jan 11 in Big Data | Hadoop
...