direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments


Implicit Parallelism through Deep Language Embedding
Zitatschlüssel DBLP:journals/sigmod/AlexandrovKKM16
Autor Alexander Alexandrov and Asterios Katsifodimos and Georgi Krastev and Volker Markl
Seiten 51–58
Jahr 2016
DOI 10.1145/2949741.2949754
Journal SIGMOD Record
Jahrgang 45
Nummer 1
Zusammenfassung Parallel collection processing based on second-order functions such as map and reduce has been widely adopted for scalable data analysis. Initially popularized by Google, over the past decade this programming paradigm has found its way in the core APIs of parallel dataflow engines such as Hadoop's MapReduce, Spark's RDDs, and Flink's DataSets. We review programming patterns typical of these APIs and discuss how they relate to the underlying parallel execution model. We argue that fixing the abstraction leaks exposed by these patterns will reduce the cost of data analysis due to improved programmer productivity. To achieve that, we first revisit the algebraic foundations of parallel collection processing. Based on that, we propose a simplified API that (i) provides proper support for nested collection processing and (ii) alleviates the need of certain second-order primitives through comprehensions – a declarative syntax akin to SQL. Finally, we present a metaprogramming pipeline that performs algebraic rewrites and physical optimizations which allow us to target parallel dataflow engines like Spark and Flink with competitive performance.
Link zur Publikation Link zur Originalpublikation Download Bibtex Eintrag

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe