A continuous data stream and an abstraction that is provided by Spark streaming are known as DStream. It can receive from the data stream processing or from the data source which was generated by the stream input. The internal structure of DStream can be represented by Resilient Distribution Datasets continuous series. The DStream operations are translated to underlying RDD operations.
The user can create these DStreams in various sources like HDFS, Apache Kafka, and Apache Flume. These DStreams can do two operations:
These can write the data to the external systems. Hence, it is known as output operation.
New DStream can be produced with the transformations.