TU Berlin

Fachgebiet Datenbanksysteme und InformationsmanagementDIMA@SIGMOD 2016

Logo FG DIMA-new  65px

Inhalt

zur Navigation

DIMA Forscher@SIGMOD 2016

v. l. n. r. Asterios Katsifodimos, Andreas Kunft, Tilmann Rabl, Alexander Alexandrov, Gábor Gévay
Lupe

Teammitglieder des Fachgebietes Datenbanksysteme und Informationsmanagement (DIMA) präsentierten ihre Forschungsergebnisse auf der SIGMOD 2016 (International Conference on Management of Data), die vom 26. Juni bis 1. Juli in San Francisco, USA stattfand. Sie stellten ein Full-Paper, ein Demo-Paper sowie ein Workshop-Paper vor. Unter den Teilnehmern waren drei DIMA-Post-Doktoranden, Dr. Tilmann Rabl, Dr. Asterios Katsifodimos und Dr. Sebastian Breß, sowie drei DIMA-Doktoranden, Alexander Alexandrov, Andreas Kunft und Gábor Gévay. Highlights waren:

I. Full-Paper

Prof. Dr. Jens Teubner von der TU Dortmund stellte eine gemeinsame Arbeit mit Dr. Sebastian Breß, der Post Doktorand bei IAM/DIMA ist, vor: “Robust Query Processing in Co-Processor-Accelerated Databases” [1].

II. Demo-Paper

Alexander Alexandrov präsentierte das Demo-Paper “Emma in Action: Declarative Dataflows for Scalable Data Analysis” [2]. Die Demonstration zeigte Emma - eine deklarative, domänenspezifische Sprache (DSL) für verteilte Datenverarbeitung, die an der TU Berlin entwickelt wird.

III. Workshop-Paper

Andreas Kunft erläuterte interessierten Zuhörern das Workshop-Paper “Bridging the Gap: Towards Optimization Across Linear and Relational Algebra” [3] während des „Algorithms and Systems for MapReduce and Beyond” Workshops (BeyondMR2016), der parallel stattfand.

Referenzen

[1] Robust Query Processing in Co-Processor-Accelerated Databases, Sebastian Breß, Henning Funke, Jens Teubner ,  Proceedings of the 2016 International Conference on Management of Data, Pages 1891-1906, New York, USA, 2016.

Abstract: Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.

In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor's data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.

We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach-data-driven query chopping-achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.

 

[2] Emma in Action: Declarative Dataflowsw for Scalable Data Analysis, Alexander Alexandrov, Andreas Salzmann, Georgi Krastev Asterios Katsifodimos Volker Markl, Proceedings of the 2016 International Conference on Management of Data, Pages 2073-2076, New York, USA, 2016.

Abstract: Parallel dataflow APIs based on second-order functions were originally seen as a flexible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions.
This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma - a programming language embedded in Scala. Emma promotes parallel collection processing trough Scala's for-comprehensions - a declarative syntax akin to SQL. In addition, Emma also advocates quasi-quoting the entire data analysis algorithm rather than its individual dataflow expressions. This allows for decomposing the quoted code into (sequential) control flow and (parallel) dataflow fragments, optimizing the dataflows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.

 

[3] Bridging the Gap: Towards Optimization Across Linear and Relational Algebra, Andreas Kunft, Alexander Alexandrov, Asterios Katsifodimos, Volker Markl, BeyondMR@SIGMOD 2016: 1

Abstract: Advanced data analysis typically requires some form of preprocessing in order to extract and transform data before processing it with machine learning and statistical analysis techniques. Pre-processing pipelines are naturally expressed in dataflow APIs (e.g., MapReduce, Flink, etc.), while machine learning is expressed in linear algebra with iterations. Programmers therefore perform end-to-end data analysis utilizing multiple programming paradigms and systems. This impedance mismatch not only hinders productivity but also prevents optimization opportunities, such as sharing of physical data layouts (e.g., partitioning) and datastructures among different parts of a data analysis program. The goal of this work is two fold. First, it aims to alleviate the impedance mismatch by allowing programmers to authorcomplete end-to-end programs in one engine-independent language that is automatically parallelized. Second, it aims to enable joint optimizations over both relational and linear algebra. To achieve this goal, we present the design of Lara, a deeply embedded language in Scala which enables authoring scalable programs using two abstract data types (DataBag and Matrix) and control flow constructs. Programs written in Lara are compiled to an intermediate representation (IR) which enables optimizations across linear and relational algebra. The IR is finally used to compile code for different execution engines.

Bilder von der SIGMOD 2016

A. Kunft erläutert das BeyondMR Workshop Paper
Lupe
Prof. Dr. Teubner präsentiert das mit Dr. Breß gemeinsam verfasste Full Paper.
Lupe
v.l.n.r. Gábor Gévay, Andreas Kunft, Tilmann Rabl, Asterios Katsifodimos, Alexander Alexandrov
Lupe
A. Alexandrov präsentiert Demo Paper "Emma in Action"
Lupe

Navigation

Direktzugang

Schnellnavigation zur Seite über Nummerneingabe