0 votes
in PySpark by
Explain the common workflow of a spark program.

1 Answer

0 votes
by

The common workflow of a spark program can be described in the following steps:

In the first step, we create the input RDDs depending on the external data. Data can be obtained from different data sources.

After creating the PySpark RDDs, we run the RDD transformation operations such as filter() or map() to create new RDDs depending on the business logic.

If we require any intermediate RDDs to reuse for later purposes, we can persist those RDDs.

Finally, if any action operations like first(), count(), etc., are present, Spark launches it to initiate parallel computation.

Related questions

0 votes
asked Mar 13, 2022 in PySpark by rajeshsharma
0 votes
asked Jun 27, 2022 in Nomad by SakshiSharma
...