Speculative Execution

In MapReduce a job is broken into several tasks which will execute in parallel. This model of execution is sensitive to slow tasks (even if they are very few in number) as they will slowdown the overall execution of a job. Therefore, Hadoop detects such slow tasks and runs (duplicate) backup tasks for such tasks. This is called speculative execution. Speculating more tasks can help jobs finish faster - but can also waste CPU cycles. Conversely - speculating fewer tasks can save CPU cycles - but cause jobs to finish slower. The options documented here allow the users to control the aggressiveness of the speculation algorithms and choose the right balance between efficiency and latency.

All the options below apply on and are settable on a per-job level. Indicative values are usually defaults.

Speculation Quota

There is a hard limit of 10% of slots used for speculation across all hadoop jobs. This is not configurable right now. However there is a per-job option to cap the ratio of speculated tasks to total tasks:

mapreduce.job.speculative.speculativecap=0.1

The value of above option can be any floating number between 0.01 and 1. It cannot be set below 0.01. By default the value is 0.1 i.e., 10% of tasks can be speculated at any time. Note that the ratio is applied to map and reduce tasks separately.

Controls on Speculation of Individual Tasks

Following are the controls on speculative execution of individual tasks and are applied in the order below:

  1. A task is not speculated for the first 60 seconds by default. It is overridable on per-job level via the following options:
mapred.speculative.map.lag=60000
mapred.speculative.reduce.lag=60000
  1. A task that is projected to finish soon is not speculated. By default this feature is disabled. To enable, set the following options:
mapred.speculative.map.duration=300
mapred.speculative.reduce.duration=300

In this example setting, if a task is projected to take 300 seconds or less then it is not speculated.

  1. A task whose progress rate is low compared to other tasks is speculated.

The progress rate of each task is compared to the mean of all the other tasks in ths job. If the difference is more than the standard deviation, then it is speculated. Speculation can be made more or less aggressive by comparing the difference to a multiple of the standard deviation that is controlled by the setting below:

mapreduce.job.speculative.slowtaskthreshold=1.0

This value is 1.0 by default. In some cases - the standard deviation is often very large (and this results in no tasks being speculated). As a result the standard deviation is capped to a certain multiple of the mean that is controlled by the setting below (with the default value of 0.8 in Qubole):

mapreduce.job.speculative.stddevmeanratio.max=0.8
  1. If a task belongs to the last few mappers or reducers - then it is speculated (regardless of its progress rate). This has been found to be extremely useful in taking care of lone straggler reducers and mappers.
  • This feature is enabled by default in Qubole and is controlled by:

    mapred.reduce.tasks.speculation.unfinished.threshold=0.001
    
    mapred.map.tasks.speculation.unfinished.threshold=0.001
    
  • With the above values the last 0.1% of tasks or 1 task (minimum) are always speculated. Most often you will see the last task being speculated.

  • To disable this feature - set the above options to value of 0.