DIMA Researchers@VLDB 2017
- © Mahdiraji/DFKI
Members from TU Berlin’s Database Systems and Information Management (DIMA) Group  recently presented their their research at VLDB 2017, the 43rd International Conference on Very Large Data Bases , in Munich, Germany, on August 28th - September 1st, 2017. Contributions included presenting a full paper, two workshop papers, and organizing two workshops. Attendees included:
- DIMA Chair: Prof. Dr. Volker Markl,
- DIMA Research Director: Prof. Dr. Tilmann Rabl
- Senior Researchers: Dr. Sebastian Breß, Dr. Alireza Rezaei Mahdiraji, and Steffen Zeuch, and
- PhD Students: Alexander Alexandrov, Christoph Boden, Bonaventura Del Monte, and Behrouz Derakhshan.
Full Paper (Research Paper): Non-Invasive Progressive Optimization for In-Memory Databases. PVLDB 9(14): 1659-1670 (2016)
Steffen Zeuch presented the paper titled “Non-Invasive Progressive Optimization for In-Memory Databases.” at the Query Optimization Session on Wednesday. The paper introduces a non-invasive optimization approach that reorder operators during run-time. In particular, this paper evaluates resource exploitation of individual query plans based on performance counters. Additionally, different cost models are proposed to interpret the sampled performance counter values. Overall, this approach enables databases to react to changes in data characteristics during run-time. As a result, the query plan is progressively optimized to approach an optimal plan and thus improves performance significantly.
Note: Steffen’s research was conducted during his PhD studies at Humboldt University in Berlin.
Link to the publication 
PhD Workshop Paper: Efficient Migration of Very Large Distributed State for Scalable Stream Processing
Bonaventura Del Monte presented his doctoral proposal at the Phd Workshop, which was co-located with the VLDB conference. Bonaventura’s proposal advocates the need of better state management techniques to enable stream processing engines to deal with distributed terabytes-sized state. As of today, many engines feature those techniques, such as, resource elasticity, fault-tolerance, query maintenance, and load balancing. However, those engines constrain their scope to partitioned, gigabytes-sized state. Therefore, this work proposes a set of protocols that enables stream processing engines to efficiently react to node failures, load unbalancing, and resources under-provisioning in the presence of large, distributed state.
Bonaventura Del Monte. Efficient Migration of Very Large Distributed State for Scalable Stream Processing. In: Proceedings of the VLDB 2017 PhD Workshop at VLDB 2017.
Link to the publication 
Workshop Paper: PEEL: A Framework for Benchmarking Distributed Systems and Algorithms
Christoph Boden presented the paper titled “PEEL: A Framework for Benchmarking Distributed Systems and Algorithms“ at the ninth TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC 2017) which was held jointly with the VLDB conference. PEEL is a framework to define, execute, analyze, and share experiments. PEEL enables the transparent specification of benchmarking workloads and system configuration parameters. It orchestrates the systems involved and automatically runs and collects all associated logs of experiments. PEEL currently supports Apache HDFS, Hadoop, Flink, and Spark and can easily be extended to include further systems.
The PEEL framework is available on GitHub 
Christoph Boden, Alexander Alexandrov, Andreas Kunft, Tilmann Rabl
and Volker Markl
PEEL: A Framework for benchmarking distributed systems and algorithms
9th TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC) at VLDB 2017.
Abstract: During the last decade, a multitude of novel systems for scalable and distributed data processing has been proposed in both academia and industry. While there are published results of experimental evaluations for nearly all systems, it remains a challenge to objectively compare different system's performance. It is thus imperative to enable and establish benchmarks for these systems. However, even if workloads and data sets or data generators are fixed, orchestrating and executing benchmarks can be a major obstacle. Worse, many systems come with hardware-dependent parameters that have to be tuned and spawn a diverse set of configuration files. This impedes portability and reproducibility of benchmarks. To address these problems and to foster reproducible and portable experiments and benchmarks of distributed data processing systems, we present PEEL , a framework to define, execute, analyze, and share experiments. PEEL enables the transparent specification of benchmarking workloads and system configuration parameters. It orchestrates the systems involved and automatically runs and collects all associated logs of experiments. PEEL currently supports Apache HDFS, Hadoop, Flink, and Spark and can easily be extended to include further systems.
Third International Workshop on Big Data Open Source Systems (BOSS)
Prof. Dr. Tilmann Rabl organized the third BOSS workshop. The workshop follows a novel tutorial based format. In parallel tracks, participants get deep-dive introductions into active, publicly available, open-source big data systems systems. In BOSS 2017, developers from Apache AsterixDB, Apache Flink, Apache Impala, Apache Spark, and TensorFlow presented their systems. All tutorials were very well attended and DIMA members Dr. Alireza Rezaei Mahdiraji, Bonaventura Del Monte, and Behrouz Derakhshan supported the plenary tutorial on Google’s TensorFlow.
The 16th International Symposium on Database Programming Languages (DBLP 2017)