DIMA Researchers at EDBT 2018 in Vienna
- © EDBT
The DIMA and DFKI IAM research group presented two publications at the 21st International Conference on Extending Database Technology (EDBT) which took place March 26-29, 2018 in Vienna, the capital of Austria.
- © DIMA / Traub
The EDBT series of conferences is an outstanding forum for international data base researchers. Held at the TU Wien's  Faculty of Electrical Engineering, the conference provided diverse opportunities exchange research observation, results, ideas, and visions. The professional conference organization and the offered social events completed the EDBT experience.
Synchronous Multi-GPU Training for Deep Learning with Low-Precision Communications: An Empirical Study. (Demjan Grubic, ETH Zurich; Leo Tam, NVIDIA; Dan Alistarh, ETH Zurich; Ce Zhang, ETH,Experiments and analysis (E&A).)
QUASII: QUery-Aware Spatial Incremental Index. (Mirjana Pavlovic, EPFL; Darius Sidlauskas, EPFL; Thomas Heinis, Imperial College; Anastasia Ailamaki, EPFL.)
NoFTL-KV: TacklingWrite-Amplification on KV-Stores with Native Storage Management. (Tobias Vincon, Reutlingen University; Sergej Hardock, TU Darmstadt; Christian Riegger, Reutlingen University; Julian Oppermann, TU Darmstadt; Andreas Koch, TU Darmstadt; Ilia Petrov, Reutlingen University.)
Deep Integration of Machine Learning Into Column Stores. (Mark Raasveldt, CWI; Pedro Holanda, CWI; Hannes Mühleisen, CWI; Stefan Manegold, CWI Amsterdam.)
- René Saitenmacher and Jonas Traub present their Poster at EDBT 2018
- © JonasTraub
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing (Philipp Marian Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl)
René Saitenmacher and Jonas Traub  presented their work on Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing. In this publication, the authors parallelize the adaptive windowing (ADWIN) algorithm introduces by Bifet et al. to enable fast concept drift adaptation in high throughput data streams.
Abstract: Machine learning techniques for data stream analysis suffer from concept drifts such as changed user preferences, varying weather conditions, or economic changes. These concept drifts cause wrong predictions and lead to incorrect business decisions. Concept drift detection methods such as adaptive windowing (Adwin) allow for adapting to concept drifts on the fly. In this paper, we examine Adwin in detail and point out its throughput bottlenecks. We then introduce several parallelization alternatives to address these bottlenecks. Our optimizations lead to a speedup of two orders of magnitude over the original Adwin implementation. Thus, we explore parallel adaptive windowing to provide scalable concept detection for high-velocity data streams with millions of tuples per second.
Open Source Repository: https://github.com/TU-Berlin-DIMA/parallel-ADWIN 
Paper: https://openproceedings.org/2018/conf/edbt/paper-318.pdf 
Download Poster 
- Tobias Behrens and Jonas Traub present their Poster at EDBT 2018
- © Jonas Traub
Efficient SIMD Vectorization for Hashing in OpenCL (Tobias Behrens, Viktor Rosenfeld, Jonas Traub, Sebastian Breß, Volker Markl)
Tobias Behrens  and Jonas Traub  presented efficient SIMD vectorization in OpenCL on the example of hash-based operations such as hash-joins. Their publications unites the generality of processor-independant OpenCL code with the performance of processor-specific vectorized SIMD instructions.
Abstract: Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processor-specific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code. OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our results show that OpenCL-based vectorization is competitive to intrinsics on CPUs but not on Xeon Phi coprocessors.
Open Source Repository: https://github.com/TU-Berlin-DIMA/OpenCL-SIMD-hashing 
Paper: https://openproceedings.org/2018/conf/edbt/paper-330.pdf 
Download Poster