direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Publikationen

Emma in Action: Declarative Dataflows for Scalable Data Analysis
Zitatschlüssel AlexandrovSKKM2016
Autor Alexander Alexandrov Andreas Salzmann Georgi Krastev Asterios Katsifodimos Volker Mark
Seiten 2073-2076
Jahr 2016
DOI 10.1145/2882903.2899396
Journal Proceedings of the 2016 International Conference on Management of Data (SIGMOD 2016)
Jahrgang 2016
Zusammenfassung Parallel dataflow APIs based on second-order functions were originally seen as a flexible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions. This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma - a programming language embedded in Scala. Emma promotes parallel collection processing through native constructs like Scala's for-comprehensions - a declarative syntax akin to SQL. In addition, Emma also advocates quasi-quoting the entire data analysis algorithm rather than its individual dataflow expressions. This allows for decomposing the quoted code into (sequential) control flow and (parallel) dataflow fragments, optimizing the dataflows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.
Link zur Publikation Link zur Originalpublikation Download Bibtex Eintrag

Zusatzinformationen / Extras

Direktzugang:

Schnellnavigation zur Seite über Nummerneingabe