Zusammenfassung |
Aggregation queries on data streams are evaluated over
evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively
studied in the past through the use of aggregate haring
techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user-defined operators on platforms such as Google Data ow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best effort aggregate sharing methods, or is not optimized at all.
In this paper we present a technique to perform effcient
aggregate sharing for data stream windows, which are de-
clared as user-de ned functions (UDFs) and can contain
arbitrary business logic. To this end, we first introduce
the concept of User-De ned Windows (UDWs), a simple,
UDF-based programming abstraction that allows users to
programmatically de ne custom windows. We then define
semantics for UDWs, based on which we design Cutty, a low-
cost aggregate sharing technique. Cutty improves and out-
performs the state of the art for aggregate sharing on single
and multiple queries. Moreover, it enables aggregate sharing
for a broad class of non-periodic UDWs. We implemented
our techniques on Apache Flink, an open source stream pro-
cessing system, and performed experiments demonstrating
orders of magnitude of reduction in aggregation costs com-
pared to the state of the art. |