Ich bin Fabian Lehmann und promoviere in Informatik am Lehrstuhl für Wissensmanagement in der Bioinformatik an der Humboldt-Universität zu Berlin. Ich werde über FONDA, ein Sonderforschungsbereich der Deutschen Forschungsgemeinschaft (DFG), gefördert.
Während meines Bachelorstudiums habe ich meine Faszination für komplexe, verteilte Systeme entdeckt. Ich begeistere mich dafür, die Limits solcher Systeme auszutesten und zu überwinden. In meiner Promotion fokussiere ich mich auf die Optimierung von Workflow Systemen zur Analyse von riesigen Datenmengen. Insbesondere konzentriere ich mich hierbei auf den Aspekt des Schedulings. Hierfür arbeite ich eng mit dem Earth Observation Lab der Humboldt-Universität zu Berlin zusammen, um die Anforderungen der Praxis zu verstehen.
Master Wirtschaftsinformatik, 2020
Abschlussarbeit: Design and Implementation of a Processing Pipeline for High Resolution Blood Pressure Sensor Data
Technische Universität Berlin
Bachelor Wirtschaftsinformatik, 2019
Abschlussarbeit: Performance-Benchmarking in Continuous-Integration-Prozessen
Technische Universität Berlin
(Ein kleiner Ausschnitt)
Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in many infrastructures, compute nodes offer highly heterogeneous resources. In consequence, predictions of the runtime of a given task on a given node, as required by many scheduling algorithms, are often rather imprecise, which can lead to sub-optimal scheduling decisions. We propose Reshi, a method for recommending task-node assignments during workflow execution that can cope with heterogeneous tasks and heterogeneous nodes. Reshi approaches the problem as a regression task, where task-node pairs are modeled as feature vectors over the results of dedicated micro benchmarks and past task executions. Based on these features, Reshi trains a regression tree model to rank and recommend nodes for each ready-to-run task, which can be used as input to a scheduler. For our evaluation, we benchmarked 27 AWS machine types using three representative workflows. We compare Reshi’s recommendations with three state-of-the-art schedulers. Our evaluation shows that Reshi outperforms HEFT by a mean makespan reduction of 7.18% and 18.01% assuming a mean task runtime prediction error of 15%.
During the last years, a growing amount of industry areas started to use microservices. Microservices offer advantages like scalability and independent service realization, but also pitfall. We noticed inconsistencies between the different existing definitions of the term microservice and practical implementations of microservice based systems. Therefore, we evaluate existing microservice-definitions and analyze the coherence of identified pitfalls to the definitions. Thereby, we observed that many pitfalls are related to imprecise definitions. With a new, distinct, and explicit definition of microservices as slice service style, we can avoid most pitfalls - the definition is given as architectural style with demand to tailor the software process model. Furthermore, we discuss pitfalls of microservices that the definition cannot avoid.
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure changes. In contrast, online methods, which predict task runtimes on specific nodes while the workflow is running, have to cope with the lack of example runs, especially during the start-up. In this paper, we present Lotaru, a novel online method for locally estimating task runtimes in scientific workflows on heterogeneous clusters. Lotaru first profiles all nodes of a cluster with a set of short-running and uniform microbenchmarks. Next, it runs the workflow to be scheduled on the user’s local machine with drastically reduced data to determine important task characteristics. Based on these measurements, Lotaru learns a Bayesian linear regression model to predict a task’s runtime given the input size and finally adjusts the predicted runtime specifically for each task-node pair in the cluster based on the micro-benchmark results. Due to its Bayesian approach, Lotaru can also compute robust uncertainty estimates and provides them as an input for advanced scheduling methods. Our evaluation with five real-world scientific workflows and different datasets shows that Lotaru significantly outperforms the baselines in terms of prediction errors for homogeneous and heterogeneous clusters.
Creating, maintaining, and operating software artifacts is a long ongoing challenge. Various management strategies have been developed and are frequently used. Nevertheless, a unification of describing the management strategies to compare them is an open question. We present ßMACH as an answer. ßMACH allows systematic descriptions and checks independently from the management strategy. In this paper, we test parts of ßMACH on the example of performance requirements. So we applied ßMACH to V-Model and Scrum.