In the Land of Data Streams where Synopses are Missing, One Framework to Bring Them All
Citation key PoepselKHQM21
Author Rudi Poepsel-Lemaitre, Martin Kiefer, Joscha von Hein, Jorge-Arnulfo Quiané-Ruiz, Volker Markl
Pages 1818 - 1831
Year 2021
Journal Proc. VLDB Endow.
Volume 14
Number 10
Abstract In pursuit of real-time data analysis, approximate summarization structures, i.e., synopses, have gained importance over the years. However, existing stream processing systems, such as Flink, Spark, and Storm, do not support synopses as first class citizens, i.e., as pipeline operators. Synopses’ implementation is upon users. This is mainly because of the diversity of synopses, which makes a unified implementation difficult. We present Condor, a framework that supports synopses as first class citizens. Condor facilitates the specification and processing of synopsis-based streaming jobs while hiding all internal processing details. Condor’s key component is its model that represents synopses as a particular case of windowed aggregate functions. An inherent divide and conquer strategy allows Condor to efficiently distribute the computation, allowing for high-performance and linear scalability. Our evaluation shows that Condor outperforms existing approaches by up to a factor of 75x and that it scales linearly with the number of cores.
