0 votes
in Hadoop by
Why Hadoop MapReduce?

1 Answer

0 votes
by

Why Hadoop MapReduce?

When we store huge amount of data in HDFS, the first question arises is, how to process this data?

Transferring all this data to a central node for processing is not going to work. And we will have to wait forever for the data to transfer over the network. Google faced this same problem with its Distributed Goggle File System (GFS). It solved this problem using a MapReduce data processing model.

1) Challenges before MapReduce:

2) Costly – All the data (terabytes) in one server or as database cluster which is very expensive. And also hard to manage.

3) Time-consuming – By using single machine we cannot analyze the data (terabytes) as it will take a lot of time.

MapReduce overcome these challenges:

4) Cost-efficient – It distribute the data over multiple low config machines.

5) Time-efficient – If we want to analyze the data. We can write the analysis code in Map function. And the integration code in Reduce function and execute it. Thus, this MapReduce code will go to every machine which has a part of our data and executes on that specific part. Hence instead of moving terabytes of data, we just move kilobytes of code. So this type of movement is time-efficient.

Related questions

0 votes
asked Nov 8, 2020 in Hadoop by rahuljain1
0 votes
asked Feb 17, 2023 in Hadoop by sharadyadav1986
...