Home
Recent Q&A
Java
Cloud
JavaScript
Python
SQL
PHP
HTML
C++
Data Science
DBMS
Devops
Hadoop
Machine Learning
Azure
Blockchain
Devops
Ask a Question
What are some key features of Apache Spark?
Home
Apache Spark
What are some key features of Apache Spark?
0
votes
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
What are some key features of Apache Spark?
apache-spark-features
Please
log in
or
register
to answer this question.
1
Answer
0
votes
answered
Mar 29, 2022
by
sharadyadav1986
Following is the list of some key features of Apache Spark:
Polyglot: Spark provides high-level APIs in Java, Scala, Python and R. We can write Spark code in any of these four languages. It provides a shell in Scala and Python. The Scala shell can be accessed through ./bin/spark-shell and Python shell through ./bin/pyspark from the installed directory.
Speed: Apache Spark provides an amazing speed upto 100 times faster than Hadoop MapReduce for large-scale data processing. We get this speed in Spark through controlled partitioning.
Multiple Formats: Apache Spark supports multiple data sources like Parquet, JSON, Hive and Cassandra. These data sources can be more than just simple pipes that convert data, pull it into Spark, and provide a pluggable mechanism to access structured data though Spark SQL.
Evaluation is lazy: Apache Spark doesn't evaluate itself until it is necessary. That's why it attains an amazing speed. Spark adds them to a DAG of computation for transformations, and they are executed only when the driver requests some data.
Real-Time Computation: The computation in Apache Spark is done in real-time and has less latency because of its in-memory computation. Spark provides massive scalability, and the Spark team has documented users of the system running production clusters with thousands of nodes and supports several computational models.
Hadoop Integration: Apache Spark is smoothly compatible with Hadoop. This is great for all the Big Data engineers who work with Hadoop. Spark is a potential replacement for the MapReduce functions of Hadoop, while Spark can run on top of an existing Hadoop cluster using YARN for resource scheduling.
Machine Learning: The MLlib of Apache Spark is used as a component of machine learning, which is very useful for big data processing. Using this, you don't need to use multiple tools, one for processing and one for machine learning. Apache Spark is great for data engineers and data scientists because it is a powerful, unified engine that is both fast and easy to use.
Related questions
0
votes
Q: What are some disadvantages or demerits of using Apache Spark?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark-disadvantages
0
votes
Q: Which are some important internal daemons used in Apache Spark?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark-daemons
0
votes
Q: What is the key difference between Apache Spark and MapReduce?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark
mapreduce
spark-vs-mapreduce
0
votes
Q: What are Actions in Apache Spark?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark-actions
0
votes
Q: What are the file formats supported by Apache Spark?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark-file-format
0
votes
Q: What are the different cluster managers available in Apache Spark?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
cluster-manager
spark-cluster
0
votes
Q: What are the most important categories of the Apache Spark that comprise its ecosystem?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark-categories
0
votes
Q: Which languages Apache Spark supports? / Which are the languages supported by Apache Spark?
asked
Mar 29, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark-languages
0
votes
Q: Does Apache Spark provide checkpoints?
asked
Aug 25, 2022
in
Apache Spark
by
sharadyadav1986
spark
checkpoints
0
votes
Q: How is Apache Spark different from MapReduce?
asked
Aug 25, 2022
in
Apache Spark
by
sharadyadav1986
apache-spark
mapreduce
...