0 votes
in PySpark by
Can we create PySpark DataFrame from external data sources?

1 Answer

0 votes
by

Yes, we can create PySpark DataFrame from external data sources. The real-time applications use external file systems like local, HDFS, HBase, MySQL table, S3 Azure, etc. The following example shows how to create DataFrame by reading data from a csv file present in the local system:

df = spark.read.csv("/path/to/file.csv")  

PySpark supports csv, text, avro, parquet, tsv and many other file extensions.

...