The common workflow of a spark program can be described in the following steps:
In the first step, we create the input RDDs depending on the external data. Data can be obtained from different data sources.
After creating the PySpark RDDs, we run the RDD transformation operations such as filter() or map() to create new RDDs depending on the business logic.
If we require any intermediate RDDs to reuse for later purposes, we can persist those RDDs.
Finally, if any action operations like first(), count(), etc., are present, Spark launches it to initiate parallel computation.