Sparrow: Distributed, Low Latency Scheduling
02 Oct 2015Takeaways from Sparrow: Distributed, Low Latency Scheduling, Ousterhout et al, SOSP, 2013.
Centralized schedulers are not suitable for low latency, high throughput job workloads (~ 100ms tasks at 1 million tasks per second). They offer quality placement at the expense of scheduler latency and scheduler throughput. Centralized schedulers also present a single point of failure.
Hence Sparrow, a distributed scheduler for low latency scheduling. Sparrow offers millisecond latency, high throughput and fault tolerance without compromising on quality placement of tasks.
Sparrow considers a model consisting of several schedulers and workers. Each job arrives at one of the schedulers. A job consists of many tasks and the scheduler finds each task a worker.
Sparrow considers several task to worker assignment strategies to motivate and arrive at its final design. Purely Random. However this performs worse with increasing cluster load.
- Per-task Sampling. Two workers at random are sampled and task is allocated to the worker with the shortest pending tasks queue. However a task might get unlucky and sample two workers that are very busy. This effect exacerbates with increasing cluster load.
- Batch Sampling. All the tasks of the job are allocated in a batch. For “m” tasks the scheduler samples “d*m” workers (d is typically 2). The tasks are placed on the “m” workers with the shortest pending tasks queue. However, size of the pending tasks queue is not a good indication of the wait time for a task. Also, multiple schedulers might sample and place tasks on the same workers concurrently leading to a race condition. Batch Sampling performs better than the previous schemes but still doesn’t do well with increasing cluster load (1.92x worse than ideal at 80% load).
- Late Binding. This is the strategy that Sparrow uses. For “m” tasks, scheduler places probe requests in the queue of all the “d*m” randomly sampled workers. On processing the probe request, the worker requests the task from the scheduler. Thus, the scheduler assigns the tasks to the first “m” workers that reply to the probe requests. Proactive cancellation of the probe requests in the other worker queues is done by the scheduler.
Sparrow also handles constraints to some extent.
- Per-Job Constraints. Batch Sampling is done only from the workers that satisfy the constraint.
- Per-Task Constraints. Batch Sampling is replaced by Per-Task Sampling. However Late Binding still applies. A small optimization that shares probe results across tasks is done.
- Sparrow cannot handle inter-job constraints.
Sparrow also approximately supports strict priorities and weighted fair sharing.
The authors also discuss the limitations and the scope for future work.
- Sparrow offers approximate policy enforcement. Pre-emption can be done to address this.
- Inter-job constraints are not supported.
- Gang scheduling is not supported.
- Query-level policies are not supported. Sparrow only approximately supports job-level policies.
- Worker failure tolerance is not supported. Occasional heartbeats from a centralized state store to all the workers can address this.
- Dynamic probe ratio is not part of the design. The value of “d” can be varied with the cluster load to address this.