TU Berlin

Database Systems and Information Management GroupManaging Very Large Distributed State for Scalable Stream Processing

Logo FG DIMA-new  65px

Page Content

to Navigation

Managing Very Large Distributed State for Scalable Stream Processing

Problem

Data stream processing systems are unable to fully cope with the massive amounts of complex-data generated at high-rates in Big Data-, Industry 4.0-, and IoT-based applications.

 

Challenge

Guaranteeing fault-tolerance, resource elasticity, and dynamic load balance requires the transfer of state, which in turn introduces latency, proportional to its size. Exactly-once stream processing engines (SPEs) require consistent state (i.e., results must be accurate, regardless of system failure, and the rescaling & rebalancing of operations on the state). SPEs must be able to continuously process stream tuples despite any of these operations. To the best of our knowledge, there is no stream processing system that fully offers robust state management, in order to efficiently handle very large, distributed state.

 

Existing SPEs (e.g., Apache Flink, Apache Spark, Apache Samza, and Timely Dataflow) offer fast stateful processing of data streams with low-latency and high-throughput, despite fluctuations in the data rate. However, stateful processing would further benefit from on-demand resource elasticity. Today, academic and industry researchers are able to address resource elasticity for stateful processing, while ensuring fault tolerance solely for partitioned or partially-distributed large state. Many streaming applications require stateful processing and generate large state that pushes SPEs to their limits (e.g., multimedia services, online marketplaces). In these applications, the size of the state can swell (up to many TBs). Current SPEs fail to use computing resources efficiently for large state sizes.

 

Objectives

1. To address scalable data stream processing and analytics challenges arising in Big Data, Cloud Computing, Industry 4.0, and IoT (Internet of Things) applications.

2. To develop a novel state management solution for scalable (i.e., low-latency, high-throughput) stream processing that enables fine-grained fault-tolerance, on-demand resource scaling, and load balancing in the presence of very large (e.g., hundreds of GBs) distributed state.

To devise a technological framework that seamlessly provides fault-tolerance, resource-scaling with zero downtime, and offers high-resource efficiency, lower operational costs, and reduced time-to-knowledge to end-users working on large-scale data applications.

Lupe

The Rhino project is funded by the Federal Ministry of Education and Research (BMBF) as part of the Software Campus program, and is supported by Huawei Technologies.

[1] Software Campus

Sponsored by the German Federal Ministry of Education and Research (BMBF), Software Campus (SC) is an executive development program aimed at developing tomorrow’s senior IT executives.

The SC program combines scientific leading-edge research with hands-on management practice in an entirely new and innovative concept. It is directed at outstanding computer science doctoral students who are interested in taking over executive management functions in industry. Awardees lead their own research projects in cooperation with industry partners over a one to two year period.

Kickoff des Jahrgangs 2017

Project Duration: 03/2019 - 02/2021

Supervisor: Prof. Dr. Volker Markl

Industry Partner

Lupe

Funded by

Lupe

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe