direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments


Fault-Tolerance for Distributed Iterative Dataflows in Action
Zitatschlüssel XuLSM18
Autor Chen Xu, Rudi Poepsel Lemaitre, Juan Soto, Volker Markl
Seiten 1990-1993
Jahr 2018
DOI https://doi.org/10.14778/3229863.3236242
Journal PVLDB 11(12)
Zusammenfassung Distributed dataflow systems (DDS) are widely employed in graph processing and machine learning (ML), where many of these algorithms are iterative in nature. Typically, DDS achieve fault-tolerance using checkpointing mechanisms or they exploit algorithmic properties to enable fault-tolerance without the need for checkpoints. Recently, for graph processing, we proposed utilizing unblocking checkpointing, to parallelize the execution pipeline and checkpoint writing, as well as confined recovery, to enable fast recovery upon partial node failures. Furthermore, for ML algorithms implemented using broadcast variables, we proposed utilizing replica recovery, to leverage broadcast variable replicas and facilitate failure recovery checkpointing-free. In this demonstration, we showcase these fault-tolerance techniques using Apache Flink. Attendees will be able to: (i) run representative iterative algorithms including PageRank, Connected Components, and K-Means, (ii) explore the internal behavior of DDS under the influence of unblocking checkpointing, and (iii) trigger failures, to observe the effects of confined recovery and replica recovery.
Link zur Originalpublikation Download Bibtex Eintrag

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe