direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Talks DIMA Research Seminar

Talks SS14
4.15 pm
EN 719

Mourad Khayati, Universität Zürich
4.15 pm
EN 719
Uwe Jugel, SAP Dresden
"M4: A Visualization-Oriented Time Series Data Aggregation"
4.15 pm
EN 719
Prof. Assaf Schuster, Technion, Haifa
“Monitoring Big, Distributed, Streaming Data “
4.15 pm
EN 719
Philipp Grosse, SAP
"Concepts of Parallelization for Advanced Analytics in the SAP HANA Database"
4.15 pm
EN 719
Peter Scheuermann, Professor Department of Electrical Engineering and Computer Science Northwestern University
12 am
EN 719
Canceled Holger Pirk, CWI
"Waste Not, Want Not - Efficient Co-Processing of Relational Data"
2 pm
EN 719
Minos Garofalakis & Odysseas Papapetrou, Technical University of Crete
Querying Distributed Data Streams"
12 am
Stefan Schmid

Uwe Jugel, SAP Dresden


M4: A Visualization-Oriented Time Series Data Aggregation


Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of time series data disregard the semantics of visualizations and result in visualization errors.
In this work, we introduce M4, an aggregation-based time series dimensionality reduction technique that provides error-free visualizations at high data reduction rates. Focusing on line charts, as the predominant form of time series visualization, we explain in detail the drawbacks of existing data reduction techniques and how our approach outperforms state of the art, by respecting the process of line rasterization.
We describe how to incorporate aggregation-based dimensionality reduction at the query-level in a visualization-driven query-rewriting system. Our approach is generic and applicable to any visualization system that uses an RDBMS as data source. Using real world data sets from high tech manufacturing, stock markets, and sports analytics domains we demonstrate that our visualization-oriented data aggregation can reduce data volumes by up to two orders of magnitude, while preserving perfect visualizations.

Prof. Assaf Schuster, Technion, Haifa

“Monitoring Big, Distributed, Streaming Data “        


More and more tasks require efficient processing of continuous queries over scalable, distributed data streams. Examples include optimizing systems using their operational log history, mining sentiments using sets of crawlers, and data fusion over heterogeneous sensor networks. However, distributed mining and/or monitoring of global behaviors can be prohibitively difficult. The naïve solution which sends all data to a central location mandates extremely high communication volume, thus incurring unbearable overheads in terms of resources and energy. Furthermore, such solutions require expensive powerful central platform, while data transmission may violate privacy rules. An attempt to enhance the naïve solution by periodically polling aggregates is bound to fail, exposing a vicious tradeoff between communication and latency. Given a continuous global query, the solution proposed in the talk is to generate filters, called safe zones, to be applied locally at each data stream. Essentially, the safe zones represent geometric constraints which, until violated by at least one of the sources, guarantee that a global property holds. In other words, the safe zones allow for constructive quiescence: There is no need for any of the data sources to transmit anything as long as all constraints are held with the local data confined to the local safe zone. The typically-rare violations are handled immediately, thus the latency for discovering global conditions is negligible. The safe zones approach makes the overall system implementation, as well as its operation, much simpler and cheaper. The saving, in terms of communication volume, can reach many orders of magnitude. The talk will describe a general approach for compiling efficient safe zones for many tasks and system configurations.


Short bio:

Professor Assaf Schuster has been a faculty member of the Technion Computer Science Department since 1991 http://assaf.net.technion.ac.il/ .

He has been interested in various aspects of parallel and distributed computing, publishing more than 2000 papers.

His algorithms on data-race detection were implemented in Intel’s Thread Checker, and his patents on distributed shared memory were sold by the Technion.

His papers triggered a rewrite of the Java Memory Model.

He has built scalable production systems to handle petabytes of storage with off-the-shelf hardware.

In recent years, his research group has focused on big data and scalable, real-time knowledge discovery in distributed data streams.

From 2010, Prof. Schuster has also been working to establish TCE – the Technion Center for Computer Engineering, which he heads http://tce.technion.ac.il .

TCE has grown to become a center of activity for about 60 faculty from the Technion and other universities in Israel and abroad,

dozens of industry leaders, and hundreds of graduate students.

Philipp Grosse, SAP


Concepts of Parallelization for Advanced Analytics in the SAP HANA Database


Complex database applications require complex custom logic to be executed in the database kernel. Traditional relational databases lack an easy to-use programming model to implement and tune such user defined code, which motivates developers to use MapReduce instead of traditional database systems.  The MapReduce framework offers a simple model to parallelize custom code, but it does not integrate well with relational databases.
Likewise, the literature on optimizing queries in relational databases has largely ignored user-defined functions (UDFs).
In this presentation, I will discuss the requirements of Advanced Analytics and introduce annotations for user-defined functions that facilitate optimizations that both consider relational operators and UDFs.

Holger Pirk, CWI Amsterdam


"Waste Not, Want Not - Efficient Co-Processing of Relational Data"


The variety of memory devices in modern computer systems holds
opportunities as well as challenges for data management systems. In
particular, the exploitation of Graphics Processing Units (GPUs) and
their fast memory has been studied quite intensively. However, current
approaches treat GPUs as systems in their own right and fail to
provide a generic strategy for efficient CPU/GPU cooperation. We
propose such a strategy for relational query processing: calculating
an approximate result based on lossily compressed, GPU-resident data
and refining the result using residuals, i.e., the lost data, on the

To assess the potential of the approach, we developed a prototypical
implementation for spatial range selections. We found multiple orders
of magnitude performance improvement over a CPU-only implementation
even if the data size exceeds the available GPU memory. Encouraged by
these results, we developed the required algorithms and techniques to
implemented the strategy in an existing in-memory DBMS and found up to
7 times performance improvement for selected TPC-H queries.

Speakers Bio: 

Holger is a PhD Candidate in the Database Architectures group at CWI
in Amsterdam with expected graduation in 2014. He received his
master's degree (Diplom) in computer science at Humboldt-Universität
zu Berlin in 2010. His research interests lie in analytical query
processing on memory-resident data. In particular, he studies storage
schemes and processing models for modern hardware.

Everybody is cordially welcome!

Minos Garofalakis & Odysseas Papapetrou, Technical University of Crete

Title: Querying Distributed Data Streams

Minos Garofalakis & Odysseas Papapetrou, Technical University of Crete

http://www.softnet.tuc.gr/~{minos, papapetrou}


Effective Big Data analytics pose several difficult challenges for modern data management architectures.
One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging distributed architectures, such as micro-cloud federations, where the resources of several, dispersed corporate cloud platforms are pulled together to enable the analysis of massive data sets.
In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this talk, we introduce the distributed data streaming model, and discuss our recent work on tracking complex queries over massive distributed streams, as well as new research directions in this space.

Speakers Bios:

Minos Garofalakis received the Diploma degree in Computer Engineering and Informatics from the University of Patras, Greece in 1992, and the M.Sc. and Ph.D. degrees in Computer Science from the University of Wisconsin-Madison in 1994 and 1998, respectively. He worked as a Member of Technical
Staff at Bell Labs, Lucent Technologies in Murray Hill, NJ (1998-2005), as a Senior Researcher at Intel Research Berkeley in Berkeley, CA (2005-2007), and as a Principal Research Scientist at Yahoo!
Research in Santa Clara, CA (2007-2008). In parallel, he also held an Adjunct Associate Professor position at the EECS Department of the University of California, Berkeley (2006-2008). As of October 2008, he is a Professor of Computer Science at the School of Electronic and Computer Engineering of
the Technical University of Crete, and the Director of the Software Technology and Network Applications Laboratory (SoftNet).

Prof. Garofalakis’ research focuses on Big Data analytics, spanning areas such
as database systems, data streams, data synopses and approximate query processing, probabilistic databases, and data mining. His work has resulted in over 120 published scientific papers in these areas, and 35 US Patent filings (27 patents issued) for companies such as Lucent, Yahoo!, and AT&T.
GoogleScholar gives over 8900 citations to his work, and an h-index value of 50. Prof. Garofalakis is an ACM Distinguished Scientist (2011), and a recipient of the IEEE ICDE Best Paper Award (2009), the Bell Labs President’s Gold Award (2004), and the Bell Labs Teamwork Award (2003). For more information, http://www.softnet.tuc.gr/~minos/

Odysseas Papapetrou is currently a researcher at the Department of Electronic and Computer Engineering, Technical University of Crete. His research interests involve several aspects of Big Data analytics, with a particular focus on distributed stream processing. He received his PhD in Computer Science from University of Hannover, while working at L3S Research Center.
He also holds an M.Sc. degree from Saarland University, Germany, and a B.Sc. and M.Sc. from the University of Cyprus. For more information, http://www.softnet.tuc.gr/~minos/

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions