Inhalt des Dokuments
- © DIMA/Traub
In April 2018, several members of the DIMA and DFKI research group attended the 34th IEEE International Conference on Data Engineering (ICDE)  in Paris. The IEEE ICDE conference is one of the leading international database conferences. This year, the conference was held in the heart of Paris at the Conservatoire National des Arts et Métiers. The conference was very well attended, even beyond the organizers expectations, which led to overly well attended sessions in many occasions. DIMA and DFKI researchers presented three publications at the conference. Jeyhun Karimov and Tilmann Rabl presented their work about “Benchmarking Distributed Stream Processing Engines” and Jonas Traub presented his paper “Scotty: Efficient Window Aggregation for out-of-order Stream Processing”. Finally, the paper “Analysis of TPCx-IoT: The First Industry Standard Benchmark for IoT Gateway Systems” presented by Meikel Poess from Oracle was co-authored by Tilmann Rabl.
- © DIMA/Traub
Adaptive Adaptive Indexing - Felix Martin Schuhknecht (Saarland University), Jens Dittrich (Saarland University), Laurent Linden (Saarland University).
LeanStore: In-Memory Data Management Beyond Main Memory - Viktor Leis (Technical University of Munich), Michael Haubenschild (Tableau Software), Alfons Kemper (Technical University of Munich), Thomas Neumann (Technical University of Munich.
Adaptive Execution of Compiled Queries - André Kohn (Technical University of Munich), Viktor Leis (Technical University of Munich), Thomas Neumann (Technical University of Munich).
Benchmarking Distributed Stream Processing Engines
- © DIMA
Jeyhun Karimov, Tilmann Rabl, Henri Heiskanen, Asterios Katsifodimos, Roman Samarev, and Volker Markl
Abstract: The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.
Paper: http://www.redaktion.tu-berlin.de/fileadmin/fg131/Publikation/Papers/Stream_Benchmarks_ICDE18-CRC.pdf 
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
- © Jonas Traub
Jonas Traub, Philipp M. Grulich, Alejandro Rodríguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl
Abstract: Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not support out-of-order processing and session windows. Out-of-order processing is a key requirement to deal with delayed tuples in case of source failures such as temporary sensor outages. Session windows are widely used to separate different periods of user activity from each other. In this paper, we present Scotty, a high throughput operator for window discretization and aggregation. Scotty splits streams into non-overlapping slices and computes partial aggregates per slice. These partial aggregates are shared among all concurrent queries with arbitrary combinations of tumbling, sliding, and session windows. Scotty introduces the first slicing technique which (1) enables stream slicing for session windows in addition to tumbling and sliding windows and (2) processes out-of-order tuples efficiently. Our technique is generally applicable to a broad group of dataflow systems which use a unified batch and stream processing model. Our experiments show that we achieve a throughput an order of magnitude higher than alternative state-of-the-art solutions.
Paper: http://www.user.tu-berlin.de/powibol/assets/publications/traub-scotty-icde-2018.pdf 
Poster: http://www.user.tu-berlin.de/powibol/assets/posters/ICDE-Scotty-Poster.pdf 
Extended Talk on Streaming Window Aggregation
Jonas Traub and Philipp Grulich will also present an extended talk of their work on streaming window aggregation at this years FlinkForward conference which will be held in September 2018 in Berlin:
Analysis of TPCx-IoT: The First Industry Standard Benchmark for IoT Gateway Systems
Meikel Poess, Raghunath Nambiar, Karthik Kulkarni, Chinmayi Narasimhadevara, Tilmann Rabl, and Hans-Arno Jacobsen
Abstract: By 2020 it is estimated that 20 billion devices will be connected to the Internet. While the initial hype around this Internet of Things (IoT) stems from consumer use cases, the number of devices and data from enterprise use cases is significant in terms of market share. With companies being challenged to choose the right digital infrastructure from different providers, there is an pressing need to objectively measure the hardware, operating system, data storage, and data management systems that can ingest, persist, and process the massive amounts of data arriving from sensors (edge devices). The Transaction Processing Performance Council (TPC) recently released the first industry standard benchmark for measuring the performance of gateway systems, TPCx-IoT. In this paper, we provide a detailed description of TPCx-IoT, mention design decisions behind key elements of this benchmark, and experimentally analyze how TPCx-IoT measures the performance of IoT gateway systems.