What is a Parquet file in Spark?

Question

What is a Parquet file in Spark?

1 Answer

SakshiSharma · Answer 1 · 2020-01-13T09:43:27+0000

Apache Parquet is a columnar storage format that is available to any project in Hadoop ecosystem. Any data processing framework, data model or programming language can use it.

It is a compressed, efficient and encoding format common to Hadoop system projects.

Spark SQL supports both reading

and writing of parquet files. Parquet files also automatically preserves the schema of the original data.

During write operations, by default all columns in a parquet file are converted to nullable column.

What is a Parquet file in Spark?

Please log in or register to answer this question.

1 Answer