in Big Data | Hadoop by

Why Mapper runs in heavy weight process and not in a thread in MapReduce?

1 Answer

0 votes
Each task is launches as a separate process instead of thread because :

Mappers are run across Hadoop clusters in distributed manner, in distributed processing environment the task is split and are run in parallel.

Threads are multiple tasks of a single process which shares the same memory area and the data and usually threads are within the boundary of a single system, but each mapper uses different data for processing(since its distributed).

Each of the Mapper task in Hadoop runs as a different JVM process, this is because MapReduce programs are long running processes and they can be killed due to usage of commodity hardware. If Map reduce were implemented as thread, one error in a single mapper could kill the entire process and hence you would have to re run all the process(since as stated above, it’s sub-task under same process).

Managing threads is relatively more complex, if a thread execution hangs then it needs to be killed and the task would have to start from where it left.