Page Content
Talks Research Colloquium
Talk/Location | Lecturer/Subject |
---|---|
Do. 07.04.2011 4 pm DIMA EN719 | Mirek Riedewald Associate Professor, Northeastern University "Scalable Search and Ranking for Scientific Data" |
Mo. 09.05.2011 4 pm DIMA EN719 | Zoltán Miklós (EPFL) "Divide and conquer techniques for data management problems" |
Mi. 25.05.2011 4.pm DIMA EN732 | Timo Proescholdt "On Finding Complementary Clusterings and the WMO Information System" |
Mo. 06.06.2011 4 pm DIMA EN 719 | Katrin Eisenreich, SAP "Creation and Change Impact Analysis of What-if Scenarios under Uncertainty and Correlation" |
Mi.
08.06.2011 16.00 c.t. DIMA EN719 | Steve
Loughran, HP |
Mo. 27.06.2011 4 pm DIMA EN719 | Ivo Santos, Microsoft Research (EMIC - European
Microsoft Innovation Center) "Analytics, Complex Events and Data Streams: Scenarios, Platforms and Trends" |
Mirek Riedewald, Northeastern University
Title: Scalable Search
and Ranking for Scientific Data
Abstract:
As the amount and complexity of data in many
scientific disciplines increases rapidly, new tools are needed for
supporting exploratory analysis and scientific discovery. Our Scolopax
system's goal is to address these challenges with novel techniques for
large-scale parallel data management. In this talk, we will present an
overview of Scolopax and then focus on parallel processing of joins.
Our proposed model simplifies reasoning about how to assign join
computation tasks to processors in MapReduce and other parallel
environments. Using this model, we derive a surprisingly simple
randomized algorithm, called 1-Bucket-Theta, for implementing
arbitrary joins in a single MapReduce job. This algorithm only
requires minimal statistics (input cardinality) and we provide proofs
and strong evidence that for a variety of join problems, its latency
is either close to optimal or the best realizable option. For some
popular joins we show how to improve over 1-Bucket-Theta by exploiting
additional input statistics. Most of these results will appear at
SIGMOD 2011; other aspects of Scolopax were published at premier data
management and data mining venues like VLDB, ICDE, ICML, and
ICDM.
Bio:
Mirek Riedewald received a Ph.D. in computer science from
the University of California at Santa Barbara in 2002. After spending
some time as a researcher at Cornell University and as a visiting
researcher at Microsoft Research, he is now an Associate Professor at
Northeastern University. Dr. Riedewald's research interests are in
databases and data mining, with an emphasis on designing scalable
techniques for data-driven science. Currently Dr. Riedewald is
developing novel approaches for parallel data processing and for
mining observational data. He has a track record of successful
collaborations with scientists from different domains, including
ornithology, physics, mechanical and aerospace engineering, and
astronomy. His work has been published in the premier peer-reviewed
data management research venues like ACM SIGMOD, VLDB, IEEE ICDE, and
IEEE TKDE, as well as in domain science journals.
Zoltán Miklós (EPFL)
Title: "Divide and conquer techniques
for data management problems"
Abstract:
Evaluating conjunctive queries over a relational database
is a central problem of database theory. This problem is closely
related to constraint satisfaction problems in artificial
intelligence. We discuss query decomposition methods, that are an
efficient means to cope with the computational intractability of these
problems. Then we discuss semantic interoperability problems in
coalitions of autonomous data sources, where we study other divide and
conquer techniques, as well. We also discuss further related questions
in Web data management, in particular entity matching in Web document
collections and Twitter streams. We discuss the fundamental
differences between the various data management settings one needs to
consider when applying divide and conquer techniques.
Bio:
Zoltan Miklos
is a postdoctoral researcher at EPFL. He defended his PhD thesis at
University of Oxford in 2008. He used to work as a research assistant
at the Vienna University of Technology and at the Vienna University of
Economics and he also worked as a software developer at Siemens. He
completed his undergraduate degrees at University ELTE, in Budapest.
His research focuses on databases, data management, artificial
intelligence and on the Semantic Web.
Timo Proescholdt
Title: "On Finding
Complementary Clusterings and the WMO Information
System"
Abstract:
On Finding Complementary Clusterings:
In many cases, a dataset
can be clustered following several criteria that complement each
other: group membership following one criterion provides little or no
information regarding group membership following the other criterion.
When these criteria are not known a priori, they have to be determined
from the data. I will discuss a new method for jointly finding the
complementary criteria and the clustering corresponding to each
criterion.On the WMO Information System:The WMO Information System
(WIS) is developed to continue ensuring the international exchange of
WMO products, such as meteorological, climatological and hydrological
data in the 21st century. WIS is a global information management
system, designed as a distributed system using a service oriented
architecture to guarantee interoperability systems in 189
countries.
In WIS information is modeled with the ISO19139 and
ISO19115 metadata standards for geospatial information, and included
into the comprehensive catalogue. The interoperability requirements
ascertain that this information can also be used in other
communities.
A future challenge is to make the information easier
to find for users on the WIS search portals with information retrieval
techniques as well as clustering and categorization algorithms.
Bio:
I was born and went to school
in Munich. Main subjects Mathematics and Geography. Decided to study
computer science at LMU Munich due to interest in networking.
Beginning of studies coincided with foundation
of IT consulting
company, to continue working with, inter alia, the Red Cross, where I
did my civil service. Other jobs during my studies included Java
programming tuition and longtime work at the university's network
operation centre.
International period after year abroad in
Barcelona at the Universitat Autonoma de Barcelona. I worked for
humanitarian organizations in Africa and for the UN in Rome and
Geneva, while finishing my studies with my
diploma thesis (1.0)
at the CNAM in Paris about Data Mining, my main study focus.
After graduation (with 1.5), I worked as a researcher at the
University of Tehran and at the CNAM in Paris, again, until I got my
current job at the World Meteorological Organization of the UN in
Geneva.
Katrin Eisenreich, SAP
Title:
"Creation and
Change Impact Analysis of What-if Scenarios under Uncertainty and
Correlation"
Abstract:
When performing what-if analysis -- a technique
increasingly applied in business planning and decision support -- both
historic and hypothetic data (assumptions) play an important role. To
construct scenarios, users apply operators to analyze, modify, and
integrate both forms of data.
An important factor in this context
is the handling of uncertainty and correlation in data, since they can
have a major impact on analysis results. Besides, once a scenario has
been created, it is important to enable users to investigate which
assumptions were made to arrive at the scenario, and how possible
changes in underlying data might influence its overall results.
Part I
In this talk, we first look at the specific aspect
of correlation in data. I will present an approach that enables users
to introduce arbitrary correlation structures to analyzed data,
exploiting statistical methods well-established in financial and risk
analysis. A central aspect of the discussed approach is the use of
precomputed approximate correlation structures (ACRs) instead of
sampling at run time. Thereby, we achieve faster processing of
correlation queries and become independent from specific statistical
library functions at query time. Further, the ACR approach opens up
possibilities to efficient processing of subsequent operations over
joint distributions, such as computing risk measures over the
correlated data. We will introduce the construction and application of
ACRs by means of an example scenario.
Part II
The
second part of the talk focuses on the topic of scenario provenance.
Apart from looking at the results of a scenario analysis, we must also
allow users to trace back to where those results came from. For
example, looking at a very high prediction for sales, a user should be
able to see whether it is backed by some evidence (e.g., historic
data) or comes mostly from very optimistic assumptions about the
business or economic factors. Also, when actual data deviates from an
applied assumption, the user should be able to see which impact this
can have on the overall scenario.
In the talk, I will illustrate
the capture and querying of provenance information based on a graph
structure. Apart from information about the derivation process of data
items, the discussed approach also takes into account the hypothetic
nature of data. In particular, specific knowledge about analytic
operators, such as for ACR-based correlation introduction, are
exploited to allow for an efficient change impact analysis over
executed scenarios
Bio:
Katrin joined SAP Research in 2006 as an intern and
completed her major thesis in the field of schema and ontology
matching in September 2007. She received her degree (Diplom) from the
TU Dresden in September 2007. Since 2008, she has been working as a
Research Associate and is now part of the Business Intelligence
research practice.
In her PhD research, Katrin is working on
concepts for handling uncertain data for scenario analysis on the
database. The data model and operators for the computation of scenario
data, as well as for tracing the processing of such data, are
implemented as an extension to the SAP In-Memory Database. This
research is part of a joint effort to provide flexible functionality
for in-memory Forecasting and Prediction.
Ivo Santos, Research (EMIC – European Microsoft Innovation Center)
Title: "Analytics, Complex Events
and Data Streams: Scenarios, Platforms and Trends"
Abstract:
Different scenarios from market
verticals such as manufacturing, oil and gas, utilities, financial
services, health care, web analytics, and IT monitoring can profit
from the opportunity to make more informed business decisions in near
real-time based on the ability to monitor, analyze and act on the data
in motion. These applications, typically event-driven and
characterized by high input data rates, continuous analytics, and
millisecond latency requirements, introduce a number of challenges to
traditional Database Management Systems (DBMS). This is pushing many
organizations to start adopting Data Stream Management Systems (DSMS),
middleware systems that incrementally process long-running continuous
queries over temporal data streams. Modern DSMS typically provide
Complex Event Processing (CEP) techniques to identify meaningful
patterns, relationships and data abstractions from among seemingly
unrelated events, triggering immediate response actions. An example of
a commercial DSMS is Microsoft StreamInsight, a platform for
developing and deploying streaming applications which leverages a
well-defined temporal stream model and algebra. This talk, besides
providing an overall introduction to CEP, will present scenarios where
its adoption is gaining momentum, provide a quick overview of existing
research and commercial CEP platforms (including a more extended
overview of Microsoft StreamInsight) and finally discuss some future
trends and challenges for CEP.
Bio:
Dr. Ivo Santos is a researcher and Software Engineer at
Microsoft Research (EMIC - European Microsoft Innovation Center -
research.microsoft.com/en-us/labs/emic/ [1]) in Aachen, Germany. He
holds a PhD in Computer Science from the University of Campinas
(UNICAMP, Brazil), worked as DAAD fellow at the Fraunhofer FOKUS
Institute (Berlin, Germany) and twice as research intern at the
Microsoft Research Database group (Redmond, WA, USA). He has expertise
in the area of distributed information systems, service oriented
architectures, data stream management systems and e-applications. His
current research interests are on middleware and tools for distributed
complex event processing systems.