in Big Data | Hadoop by
What is a Parquet file in Spark?

1 Answer

0 votes
Apache Parquet is a columnar storage format that is available to any project in Hadoop ecosystem. Any data processing framework, data model or programming language can use it.

It is a compressed, efficient and encoding format common to Hadoop system projects.

Spark SQL supports both reading


and writing of parquet files. Parquet files also automatically preserves the schema of the original data.

During write operations, by default all columns in a parquet file are converted to nullable column.

Related questions

+1 vote
asked Mar 9, 2020 in Spark Sql by SakshiSharma
0 votes
asked Jan 13, 2020 in Big Data | Hadoop by sharadyadav1986