0 votes
in PySpark by
Can we create PySpark DataFrame from external data sources?

1 Answer

0 votes
by

Yes, we can create PySpark DataFrame from external data sources. The real-time applications use external file systems like local, HDFS, HBase, MySQL table, S3 Azure, etc. The following example shows how to create DataFrame by reading data from a csv file present in the local system:

df = spark.read.csv("/path/to/file.csv")  

PySpark supports csv, text, avro, parquet, tsv and many other file extensions.

Related questions

0 votes
asked Aug 24, 2023 in Apache Superset by Robin
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma
...