TU Berlin

Database Systems and Information Management GroupDIMA@SIGMOD 2016

Logo FG DIMA-new  65px

Page Content

to Navigation

DIMA Researcher@SIGMOD 2016

f.l.t.r. Asterios Katsifodimos, Andreas Kunft, Tilmann Rabl, Alexander Alexandrov, Gábor Gévay

Members from TU Berlin’s Database Systems and Information Management Group (DIMA) recently presented their research at the 2016 International Conference on Management of Data (SIGMOD 2016), which was held from June 26 to July 1 in San Francisco, USA. DIMA contributions included presenting a full paper, a demo paper, and a workshop paper. Among the attendees were three DIMA Postdocs: Dr. Tilmann Rabl, Dr. Asterios Katsifodimos, and Dr. Sebastian Breß, and three DIMA PhD Students: Alexander Alexandrov, Andreas Kunft, and Gábor Gévay. Key highlights include

I. Full-Paper

Prof. Dr. Jens Teubner (TU Dortmund) presented the paper “Robust Query Processing in Co-Processor-Accelerated Databases”, which is a joint work with Dr. Sebastian Breß, senior researcher at IAM/DIMA [1].

II. Demo-Paper

Alexander Alexandrov presented the demo paper “Emma in Action: Declarative Dataflows for Scalable Data Analysis” [2]. The demonstration showcased Emma – a declarative domain-specific language (DSL) for distributed collection processing developed at TU Berlin.

III. Workshop-Paper

Andreas Kunft presented the workshop paper “Bridging the Gap: Towards Optimization Across Linear and Relational Algebra” [3] at the “Algorithms and Systems for MapReduce and Beyond”) workshop (BeyondMR2016).




[1] Robust Query Processing in Co-Processor-Accelerated Databases, Sebastian Breß, Henning Funke, Jens Teubner ,  Proceedings of the 2016 International Conference on Management of Data, Pages 1891-1906, New York, USA, 2016.

Abstract: Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.

In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor's data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.

We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach-data-driven query chopping-achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.


[2] Emma in Action: Declarative Dataflowsw for Scalable Data Analysis, Alexander Alexandrov, Andreas Salzmann, Georgi Krastev Asterios Katsifodimos Volker Markl, Proceedings of the 2016 International Conference on Management of Data, Pages 2073-2076, New York, USA, 2016.

Abstract: Parallel dataflow APIs based on second-order functions were originally seen as a flexible alternative to SQL. Over time, however, their complexity increased due to the number of physical aspects that had to be exposed by the underlying engines in order to facilitate efficient execution. To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions.
This demonstration highlights the benefits of an alternative design based on deep language embedding. We showcase Emma - a programming language embedded in Scala. Emma promotes parallel collection processing trough Scala's for-comprehensions - a declarative syntax akin to SQL. In addition, Emma also advocates quasi-quoting the entire data analysis algorithm rather than its individual dataflow expressions. This allows for decomposing the quoted code into (sequential) control flow and (parallel) dataflow fragments, optimizing the dataflows in context, and transparently offloading them to an engine like Spark or Flink. The proposed design promises increased programmer productivity due to avoiding an impedance mismatch, thereby reducing the lag times and cost of data analysis.


[3] Bridging the Gap: Towards Optimization Across Linear and Relational Algebra, Andreas Kunft, Alexander Alexandrov, Asterios Katsifodimos, Volker Markl, BeyondMR@SIGMOD 2016: 1

Abstract: Advanced data analysis typically requires some form of preprocessing in order to extract and transform data before processing it with machine learning and statistical analysis techniques. Pre-processing pipelines are naturally expressed in dataflow APIs (e.g., MapReduce, Flink, etc.), while machine learning is expressed in linear algebra with iterations. Programmers therefore perform end-to-end data analysis utilizing multiple programming paradigms and systems. This impedance mismatch not only hinders productivity but also prevents optimization opportunities, such as sharing of physical data layouts (e.g., partitioning) and datastructures among different parts of a data analysis program. The goal of this work is two fold. First, it aims to alleviate the impedance mismatch by allowing programmers to authorcomplete end-to-end programs in one engine-independent language that is automatically parallelized. Second, it aims to enable joint optimizations over both relational and linear algebra. To achieve this goal, we present the design of Lara, a deeply embedded language in Scala which enables authoring scalable programs using two abstract data types (DataBag and Matrix) and control flow constructs. Programs written in Lara are compiled to an intermediate representation (IR) which enables optimizations across linear and relational algebra. The IR is finally used to compile code for different execution engines.

Photos of SIGMOD 2016

A. Kunft presents the BeyondMR workshop paper.
Prof. Dr. Teubner presents a full paper, joint work with Dr. Breß.
f.l.t.r. Gábor Gévay, Andreas Kunft, Tilmann Rabl, Asterios Katsifodimos, Alexander Alexandrov
A. Alexandrov presents the demo paper "Emma in Action".


Quick Access

Schnellnavigation zur Seite über Nummerneingabe