TU Berlin

Database Systems and Information Management GroupFast failure recovery for iterative algorithms in distributed dataflow systems

Logo FG DIMA-new  65px

Page Content

to Navigation

Basic Information

Student: Markus Holzemer

Advisor: Chen Xu

Degree: Master

Abstract

When scaling out clusters to compute complex insights in long-running iterative jobs failures become quite frequent.
Therefore, the goal of this thesis was to find a recovery mechanism for distributed dataflow systems that minimizes the recovery time of iterative jobs while keeping the runtime overhead during normal execution as low as possible.
To achieve this we propose a non-blocking way of taking checkpoints and analyse the three different recovery methods simple checkpointing, confined recovery and replication based recovery both theoretical and with extensive experiments.

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe