direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Forschungsarbeit zum neuen Control-Flow-System „Mitos“ zur Veröffentlichung bei der ICDE 2021 angenommen

Die Forschungsarbeit mit dem Titel „Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance,” verfasst von Gábor E. Gévay, Tilmann Rabl, Sebastian Breß, Loránd Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz und Volker Markl wurde zur Veröffentlichung auf der 37th IEEE International Conference on Data Engineering (ICDE 2021) [1] angenommen. Die Autoren der TU Berlin und des DFKI werden Mitos präsentieren, ein Control-Flow-System zur Datenanalyse, welches sowohl hohe Leistung als auch Bedienungsfreundlichkeit vereint.

Abstract: Modern data analysis tasks often involve control flow statements, such as iterations. Common examples are PageRank and K-means. To achieve scalability, developers usually implement data analysis tasks in distributed dataflow systems, such as Spark and Flink. However, for tasks with control flow statements, these systems still either suffer from poor performance or are hard to use. For example, while Flink supports iterations and Spark provides ease-of-use, Flink is hard to use and Spark has poor performance for iterative tasks. As a result, developers typically have to implement different workarounds to run their jobs with control flow statements in an easy and efficient way. We propose Mitos, a system that achieves the best of both worlds: it achieves both high performance and ease-of-use. Mitos uses an intermediate representation that abstracts away specific control flow statements and is able to represent any imperative control flow. This facilitates building the dataflow graph and coordinating the distributed execution of control flow in a way that is not tied to specific control flow constructs. Our experimental evaluation shows that the performance of Mitos is more than one order of magnitude better than systems that launch new dataflow jobs for every iteration step. Remarkably, it is also up to 10.5x faster than Flink, which has native iteration support, while matching the ease-of-use of Spark.

> Preprint “Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance” [PDF]

------ Links: ------

Zusatzinformationen / Extras

Direktzugang:

Schnellnavigation zur Seite über Nummerneingabe

Copyright TU Berlin 2008