Page Content
Talks DIMA Research Seminar
Talk/Location | Lecturer/Subject |
---|---|
14.07.2014 4.15 pm DIMA EN 719 | Mourad Khayati,
Universität Zürich "t.b.a." |
30.06.2014 4.15 pm DIMA EN 719 | Uwe Jugel, SAP Dresden "M4: A Visualization-Oriented Time Series Data Aggregation" |
12.05.2014 4.15 pm DIMA EN 719 | Prof. Assaf Schuster, Technion, Haifa “Monitoring Big, Distributed, Streaming Data “ |
05.05.2014 4.15 pm DIMA EN 719 | Philipp Grosse, SAP "Concepts of Parallelization for Advanced Analytics in the SAP HANA Database" |
28.04.2014 4.15 pm DIMA EN 719 | Peter Scheuermann, Professor Department
of Electrical Engineering and Computer Science Northwestern
University |
24.04.2014 12 am DIMA EN 719 | Canceled Holger Pirk, CWI "Waste Not, Want Not - Efficient Co-Processing of Relational Data" |
08.04.2014 2 pm DIMA EN 719 | Minos Garofalakis & Odysseas Papapetrou,
Technical University of Crete "Querying Distributed Data Streams" |
05.04.2014 12 am DIMA EN 719 | Stefan
Schmid |
Uwe Jugel, SAP Dresden
Title:
M4: A Visualization-Oriented Time Series
Data Aggregation
Abstract:
Visual analysis of high-volume time series data is
ubiquitous in many industries, including finance, banking, and
discrete manufacturing. Contemporary, RDBMS-based systems for
visualization of high-volume time series data have difficulty to cope
with the hard latency requirements and high ingestion rates of
interactive visualizations. Existing solutions for lowering the volume
of time series data disregard the semantics of visualizations and
result in visualization errors.
In this work, we introduce M4,
an aggregation-based time series dimensionality reduction technique
that provides error-free visualizations at high data reduction rates.
Focusing on line charts, as the predominant form of time series
visualization, we explain in detail the drawbacks of existing data
reduction techniques and how our approach outperforms state of the
art, by respecting the process of line rasterization.
We
describe how to incorporate aggregation-based dimensionality reduction
at the query-level in a visualization-driven query-rewriting system.
Our approach is generic and applicable to any visualization system
that uses an RDBMS as data source. Using real world data sets from
high tech manufacturing, stock markets, and sports analytics domains
we demonstrate that our visualization-oriented data aggregation can
reduce data volumes by up to two orders of magnitude, while preserving
perfect visualizations.
Prof. Assaf Schuster, Technion, Haifa
“Monitoring Big, Distributed, Streaming Data “
Abstract:
More and more tasks require efficient processing of continuous queries over scalable, distributed data streams. Examples include optimizing systems using their operational log history, mining sentiments using sets of crawlers, and data fusion over heterogeneous sensor networks. However, distributed mining and/or monitoring of global behaviors can be prohibitively difficult. The naïve solution which sends all data to a central location mandates extremely high communication volume, thus incurring unbearable overheads in terms of resources and energy. Furthermore, such solutions require expensive powerful central platform, while data transmission may violate privacy rules. An attempt to enhance the naïve solution by periodically polling aggregates is bound to fail, exposing a vicious tradeoff between communication and latency. Given a continuous global query, the solution proposed in the talk is to generate filters, called safe zones, to be applied locally at each data stream. Essentially, the safe zones represent geometric constraints which, until violated by at least one of the sources, guarantee that a global property holds. In other words, the safe zones allow for constructive quiescence: There is no need for any of the data sources to transmit anything as long as all constraints are held with the local data confined to the local safe zone. The typically-rare violations are handled immediately, thus the latency for discovering global conditions is negligible. The safe zones approach makes the overall system implementation, as well as its operation, much simpler and cheaper. The saving, in terms of communication volume, can reach many orders of magnitude. The talk will describe a general approach for compiling efficient safe zones for many tasks and system configurations.
Short bio:
Professor Assaf Schuster has been a faculty member of the Technion Computer Science Department since 1991 http://assaf.net.technion.ac.il/ [1] .
He has been interested in various aspects of parallel and distributed computing, publishing more than 2000 papers.
His algorithms on data-race detection were implemented in Intel’s Thread Checker, and his patents on distributed shared memory were sold by the Technion.
His papers triggered a rewrite of the Java Memory Model.
He has built scalable production systems to handle petabytes of storage with off-the-shelf hardware.
In recent years, his research group has focused on big data and scalable, real-time knowledge discovery in distributed data streams.
From 2010, Prof. Schuster has also been working to establish TCE – the Technion Center for Computer Engineering, which he heads http://tce.technion.ac.il [2] .
TCE has grown to become a center of activity for about 60 faculty from the Technion and other universities in Israel and abroad,
dozens of industry leaders, and hundreds of graduate students.
Philipp Grosse, SAP
Titel:
Concepts of Parallelization for Advanced Analytics in the SAP HANA
Database
Abstract:
Complex
database applications require complex custom logic to be executed in
the database kernel. Traditional relational databases lack an easy
to-use programming model to implement and tune such user defined code,
which motivates developers to use MapReduce instead of traditional
database systems. The MapReduce framework offers a simple model
to parallelize custom code, but it does not integrate well with
relational databases.
Likewise, the literature on optimizing
queries in relational databases has largely ignored user-defined
functions (UDFs).
In this presentation, I will discuss the
requirements of Advanced Analytics and introduce annotations for
user-defined functions that facilitate optimizations that both
consider relational operators and UDFs.
Holger Pirk, CWI Amsterdam
TITEL:
"Waste Not, Want Not - Efficient Co-Processing of Relational Data"
ABSTRACT:
The variety of memory devices in modern computer systems holds
opportunities as well as challenges for data management systems.
In
particular, the exploitation of Graphics Processing Units
(GPUs) and
their fast memory has been studied quite intensively.
However, current
approaches treat GPUs as systems in their own
right and fail to
provide a generic strategy for efficient
CPU/GPU cooperation. We
propose such a strategy for relational
query processing: calculating
an approximate result based on
lossily compressed, GPU-resident data
and refining the result
using residuals, i.e., the lost data, on the
CPU.
To assess the potential of the approach, we developed a
prototypical
implementation for spatial range selections. We
found multiple orders
of magnitude performance improvement over
a CPU-only implementation
even if the data size exceeds the
available GPU memory. Encouraged by
these results, we developed
the required algorithms and techniques to
implemented the
strategy in an existing in-memory DBMS and found up to
7 times
performance improvement for selected TPC-H queries.
Speakers Bio:
Holger is a PhD Candidate in the Database Architectures group at
CWI
in Amsterdam with expected graduation in 2014. He received
his
master's degree (Diplom) in computer science at
Humboldt-Universität
zu Berlin in 2010. His research interests
lie in analytical query
processing on memory-resident data. In
particular, he studies storage
schemes and processing models for
modern hardware.
Everybody is cordially welcome!
Minos Garofalakis & Odysseas Papapetrou, Technical University of Crete
Title: Querying Distributed Data Streams
Minos Garofalakis & Odysseas Papapetrou, Technical University of Crete
http://www.softnet.tuc.gr/~{minos [3], papapetrou}
Abstract:
Effective Big Data analytics pose several difficult
challenges for modern data management architectures.
One key
such challenge arises from the naturally streaming nature of big data,
which mandates efficient algorithms for querying and analyzing
massive, continuous data streams (that is, data that is seen only once
and in a fixed order) with limited memory and CPU-time resources. Such
streams arise naturally in emerging distributed architectures, such as
micro-cloud federations, where the resources of several, dispersed
corporate cloud platforms are pulled together to enable the analysis
of massive data sets.
In addition to memory- and time-efficiency
concerns, the inherently distributed nature of such applications also
raises important communication-efficiency issues, making it critical
to carefully optimize the use of the underlying network
infrastructure. In this talk, we introduce the distributed data
streaming model, and discuss our recent work on tracking complex
queries over massive distributed streams, as well as new research
directions in this space.
Speakers
Bios:
Minos Garofalakis received the Diploma degree in Computer
Engineering and Informatics from the University of Patras, Greece in
1992, and the M.Sc. and Ph.D. degrees in Computer Science from the
University of Wisconsin-Madison in 1994 and 1998, respectively. He
worked as a Member of Technical
Staff at Bell Labs, Lucent
Technologies in Murray Hill, NJ (1998-2005), as a Senior Researcher at
Intel Research Berkeley in Berkeley, CA (2005-2007), and as a
Principal Research Scientist at Yahoo!
Research in Santa Clara,
CA (2007-2008). In parallel, he also held an Adjunct Associate
Professor position at the EECS Department of the University of
California, Berkeley (2006-2008). As of October 2008, he is a
Professor of Computer Science at the School of Electronic and Computer
Engineering of
the Technical University of Crete, and the
Director of the Software Technology and Network Applications
Laboratory (SoftNet).
Prof. Garofalakis’ research focuses on Big Data analytics,
spanning areas such
as database systems, data streams, data
synopses and approximate query processing, probabilistic databases,
and data mining. His work has resulted in over 120 published
scientific papers in these areas, and 35 US Patent filings (27 patents
issued) for companies such as Lucent, Yahoo!, and AT&T.
GoogleScholar gives over 8900 citations to his work, and an h-index
value of 50. Prof. Garofalakis is an ACM Distinguished Scientist
(2011), and a recipient of the IEEE ICDE Best Paper Award (2009), the
Bell Labs President’s Gold Award (2004), and the Bell Labs Teamwork
Award (2003). For more information, http://www.softnet.tuc.gr/~minos/
[4]
Odysseas Papapetrou is currently a researcher at the
Department of Electronic and Computer Engineering, Technical
University of Crete. His research interests involve several aspects of
Big Data analytics, with a particular focus on distributed stream
processing. He received his PhD in Computer Science from University of
Hannover, while working at L3S Research Center.
He also holds an
M.Sc. degree from Saarland University, Germany, and a B.Sc. and M.Sc.
from the University of Cyprus. For more information,
http://www.softnet.tuc.gr/~minos/ [5]