direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Es gibt keine deutsche Übersetzung dieser Webseite.

Basic Information

Student: Markus Holzemer

Advisor: Chen Xu

Degree: Master

Abstract

When scaling out clusters to compute complex insights in long-running iterative jobs failures become quite frequent.
Therefore, the goal of this thesis was to find a recovery mechanism for distributed dataflow systems that minimizes the recovery time of iterative jobs while keeping the runtime overhead during normal execution as low as possible.
To achieve this we propose a non-blocking way of taking checkpoints and analyse the three different recovery methods simple checkpointing, confined recovery and replication based recovery both theoretical and with extensive experiments.

Zusatzinformationen / Extras