Yes, we can create PySpark DataFrame from external data sources. The real-time applications use external file systems like local, HDFS, HBase, MySQL table, S3 Azure, etc. The following example shows how to create DataFrame by reading data from a csv file present in the local system:
df = spark.read.csv("/path/to/file.csv")
PySpark supports csv, text, avro, parquet, tsv and many other file extensions.