in PySpark by (23.9k points)
Can we create PySpark DataFrame from external data sources?

1 Answer

0 votes
by (23.9k points)

Yes, we can create PySpark DataFrame from external data sources. The real-time applications use external file systems like local, HDFS, HBase, MySQL table, S3 Azure, etc. The following example shows how to create DataFrame by reading data from a csv file present in the local system:

df = spark.read.csv("/path/to/file.csv")  

PySpark supports csv, text, avro, parquet, tsv and many other file extensions.

Related questions

0 votes
asked Aug 24 in Apache Superset by Robin (14.6k points)
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma (23.9k points)
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma (23.9k points)
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma (23.9k points)
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma (23.9k points)
0 votes
0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma (23.9k points)
...