In PySpark, serialization is a process that is used to conduct performance tuning on Spark. PySpark supports serializers because we have to continuously check the data sent or received over the network to the disk or memory. PySpark supports two types of serializers. They are as follows:
- PickleSerializer: This is used to serialize the objects using Python's PickleSerializer using class pyspark.PickleSerializer). This serializer supports almost every Python object.
- MarshalSerializer: The MarshalSerializer is used to perform serialization of objects. This can be used by using class pyspark.MarshalSerializer. This serializer is way faster than the PickleSerializer, but it supports only limited types.