Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters

Abstract

Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure changes. In contrast, online methods, which predict task runtimes on specific nodes while the workflow is running, have to cope with the lack of example runs, especially during the start-up.

Publication
arXiv
Fabian Lehmann
Fabian Lehmann
Ph.D. candidate

My research interests include distributed systems, scientific workflows, and their scheduling.