0 votes
in Apache Spark by
What was the need for Apache Spark?

1 Answer

0 votes
by

Many general-purpose cluster computing tools in the market, such as Hadoop MapReduce, Apache Storm, Apache Impala, Apache Giraph and many more. But each one has some limitations in its functionalities.

We can see the limitations as:

  • Hadoop MapReduce can only allow for batch processing.
  • If we talk about stream processing, then only Apache Storm / S4 can perform it.
  • If we need interactive processing, then only Apache Impala / Apache Tezcan perform it.
  • If we need to perform graph processing, then only Neo4j / Apache Giraph can do it.
  • Here, we can see that no single engine can perform all the tasks together. So, there was a requirement of a powerful engine that can process the data in real-time (streaming) and batch mode and respond to sub-second and perform in-memory processing.

This is how Apache Spark comes into existence. It is a powerful open-source engine that offers interactive processing, real-time stream processing, graph processing, in-memory processing and batch processing. It provides a very fast speed, ease of use, and a standard interface simultaneously.

...