What is a Speculative Execution in Hadoop MapReduce?
Speculative Execution in Spark
Hadoop Interview Questions and Answers – Speculative Execution
MapReduce breaks jobs into tasks and these tasks run parallel rather than sequential. Thus reduces overall execution time. This model of execution is sensitive to slow tasks as they slow down the overall execution of a job. There are various reasons for the slowdown of tasks like hardware degradation. But it may be difficult to detect causes since the tasks still complete successfully. Although it takes more time than the expected time.
Apache Hadoop doesn’t try to diagnose and fix slow running task. Instead, it tries to detect them and run backup tasks for them. This is called Speculative execution in Hadoop. These backup tasks are called Speculative tasks in hadoop. First of all Hadoop framework launch all the tasks for the job in Hadoop MapReduce. Then it launches speculative tasks for those tasks that have been running for some time (one minute). And the task that has not made any much progress, on average, as compared with other tasks from the job. If the original task completes before the speculative task. Then it will kill the speculative task. On the other hand, it will kill the original task if the speculative task finishes before it.