Inhalt des Dokuments
Termine DIMA Kolloquium
Termin/Ort | Dozent/Thema |
---|---|
14.05.2012 16.00 c.t. DIMA EN 719 | Vasiliki
Kalavri,KTH |
19.03.2012 16.00 c.t. DIMA EN 719 | Katrin Eisenreich, SAP "Correlation Support for Risk Evaluation in Databases" |
20.02.2012 16.00 c.t. DIMA EN 719 | Prof. Seamus Ross, iSchool, University
of Toronto Facilitating Digital Preservation through Risk Management |
13.02.2012 17.00 s.t. DIMA EN 719 | Martin Grund, HPI HYRISE "A Hybrid In-Memory Storage Engine" |
21.11.2011 16.00 c.t. DIMA EN 719 | Philippe Cudre-Mauroux,
WS11/12 [1]University of Fribourg, Switzerland "dipLODocus[RDF]--Short and Long-Tail RDF Analytics for Massive Webs of Data" |
14.11.2011 16.00 c.t. DIMA EN 719 | Daniel Boesswetter, FU Berlin "A Hybrid Approach to Physical Data Placement in Relational Database Systems" |
01.11.2011 14.00 s.t. DIMA EN 719 | Dean Jacobs, SAP "DaaS for SaaS" |
01.11.2011 11.00 s.t. DIMA EN 719 | Markus Weimer, Yahoo "Machine learning in ScalOps, a higher order cloud computing language" |
Martin Grund, HPI
Martin Grund, HPI
Title: HYRISE - A Hybrid In-Memory Storage
Engine
Abstract:
In this
talk I will present my research findings in the area of enterprise
application-aware database systems. Based on the current evolution in
computer hardware architectures, main memory plays an increasing
important role for modern database systems. To fully leverage the
performance properties of main memory, database systems need to be
aware of the properties of enterprise applications. Therefore I will
show some analysis of enterprise applications and their data usage and
how this can affect the database layer.
Current database systems
lack features that optimize the physical storage engine based on the
workload of the applications (without replication) due to the high
cost of rearranging the data on disk. Using our research prototype -
HYRISE - I present a system that depending on the input workload
allows selecting an optimal vertical partitioning for the tables of
the application. To validate the approach we used an adapted
application benchmark.
Since modern enterprise applications will
evolve faster then the previous generations it becomes more important
to track changes of the workload and provide a high-performance
algorithm to adapt to changes. As an outcome we will present an
adapted version of our initial algorithm that allows incremental
calculating of the optimal layout to reduce the overall search
space.
The talk concludes with a summary of the specific
contributions and an outlook for future research in the area of hybrid
main memory database systems.
Bio:
Martin Grund received his Bachelor and Master's Degree from
the Hasso Plattner Institute in Potsdam, Germany. Currently he is
finishing his PhD at the chair of Prof. Plattner in the area of hybrid
main memory databases.
Everybody is cordially welcome!
Please, forward this invitation to interested colleagues.
Katrin Eisenreich, SAP
Katrin Eisenreich, SAP
Title:
Correlation Support for Risk Evaluation in Databases
Abstract:
Investigating potential dependencies in data and their effect on
future business developments can help experts to prevent
misestimations of risks and chances. This makes correlation a highly
important factor in risk analysis tasks. Previous research on
correlation in uncertain data management addressed foremost the
handling of dependencies between discrete rather than continuous
distributions. Also, none of the existing approaches provides a clear
method for extracting correlation structures from data and introducing
assumptions about correlation to independently represented data.
To enable risk analysis under correlation assumptions, we use an
approximation technique based on copula functions. This technique
enables analysts to introduce arbitrary correlation structures between
arbitrary distributions and calculate relevant measures over thus
correlated data. The correlation information can either be extracted
at runtime from historic data or be accessed from a parametrically
precomputed structure. We discuss the construction, application and
querying of approximate correlation representations for different
analysis tasks. Two use cases serve to motivate and exemplify the
application of the presented operators. Our experiments demonstrate
the efficiency and accuracy of the proposed approaches, and point out
several possibilities for optimization.
The presented paper has
been accepted as a full paper at ICDE 2012. In the talk, I would like
to prepare for the conference presentation. I am therefore looking
forward to your comments and suggestions, since they will help me
improve the talk.
“Correlation Support for Risk Evaluation in
Databases”. Katrin Eisenreich, Jochen Adamek, Philipp Rösch, Gregor
Hackenbroich and Volker Markl. ICDE 2012. To be published April
2012.
Bio:
Katrin Eisenreich graduated from the Technical University of
Dresden in 2007, after completing her major thesis in the field of
schema and ontology matching in cooperation with SAP Research. Since
2008, Katrin has been working as a Research Associate in the Business
Intelligence practice of SAP Research. In her PhD research, she
addresses the efficient representation and processing of uncertain
data for scenario analysis on the database, with special focus on the
handling of correlations and the efficient re-computation of analysis
queries.
Philippe Cudre-Mauroux, University of Fribourg
Philippe Cudre-Mauroux, University of Fribourg, Switzerland
Title:
dipLODocus[RDF]--Short and Long-Tail RDF Analytics for Massive Webs of Data
Abstract:
The proliferation of semantic data on the Web requires RDF database systems to constantly improve their scalability and transactional efficiency. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. This paper introduces a novel database back-end for RDF data called dipLODocus[RDF], which supports both transactional and analytical queries efficiently. dipLODocus[RDF] takes advantage of a new hybrid storage model for RDF data based on recurring graph patterns. In this talk, I will describe the general architecture of the system and compare its performance to state-of-the-art solutions for both transactional and analytic workloads.
Bio:
Philippe Cudre-Mauroux is a Swiss-NSF associate professor at the University of Fribourg in Switzerland. Previously, he was a postdoctoral associate working in the Database Systems group at MIT. He received his Ph.D. from the Swiss Federal Institute of Technology EPFL, where he won both the Doctorate Award and the EPFL Press Mention in 2007. Before joining the University of Fribourg, he worked on distributed information management systems for HP, IBM T.J. Watson Research, and Microsoft Research Asia. His research interests are in exascale data management infrastructures for non-relational data. Webpage: http://diuf.unifr.ch/xi [2]
Everybody is cordially welcome! Please, forward this invitation to interested colleagues.
Daniel Boesswetter, FU Berlin
Daniel Boesswetter, FU Berlin
Title:
"A Hybrid Approach to Physical Data Placement in Relational Database Systems"
Abstract:
In recent years, relational database technology has undergone a
diversification process: while major database vendors have refined
a
single architecture for all data processing purposes over four
decades, it
has now become evident in research and practice,
that this architecture no
longer fits current hardware or
nowadays' requirements. Column-oriented
physical data models and
execution strategies have gained enormous
interest in research
and industry. Column-orientation is known to
increase the
execution speed of analytical relational queries (OLAP) that
require few attributes from many tuples instead of all attributes of
few
tuples, as it is typical for transactional workloads (OLTP).
Moreover,
it supports the compression of data which leads to
higher throughput
for column scans which are common in data
warehouse execution plans. On
the other hand, column-orientation
and compression are counterproductive
for transactional
processing, because each update potentially leads to
many writes
or even worse, to an expensive reorganization of compressed
data. This results in the typical two-tier approach with one (or
more)
row-oriented, possibly main-memory based system being
responsible
for transaction processing and a separate
column-oriented system for
the analytics. Data is transferred
from the former to the latter in
regular intervals by an
extract-transform-load (ETL) process. While
having two
independent systems for OLAP and OLTP may have advantages,
it is
not always desirable. Real-time businesses demand analytical
queries on up-to-date data so that a nightly ETL-process might be
insufficient. High-volume updates as found in telecommunication
systems
might even be too large to be imported into the data
warehouse in time. It
is thus an open research question whether
a reunification of relational
systems into a single architecture
for both requirements is possible,
such that data can be
analyzed directly at it source. This talk deals
with
hybrid data placement strategies which allow both workloads under
trade-offs to each other.
Bio:
Daniel Boesswetter received his Diploma in Computer Science from the Technische Universitaet Muenchen in 2004. After professional experience at PEPPERMIND in Munich (1998-2003) and at Jamba! in Berlin (2004-2005), he has been a member of the Database and Information Systems Group of the Freie Universitaet Berlin starting in 2007 where he is currently finishing his dissertation. Daniel is author of several scientific and non-scientific articles in the field of database technology. Everybody is cordially welcome! Please, forward this invitation to interested colleagues.
Dean Jacobs, SAP
Dean Jacobs, SAP
Title: "DaaS for SaaS"
Abstract
This talk describes our current
work designing a Database as a Service layer based on SAP’s
in-memory database to support Software as a Service. This layer will
offer a single-tenant-process model, where each process contains a
single tenant and there are multiple processes on each machine, as
well as a multi-tenant-process model, where each process contains
multiple tenants. It will support heterogeneous multi-tenancy, which
spans different applications, as well as homogeneous multi-tenancy,
which spans multiple instances of the same application. The primary
functions of this layer are to distribute tenants across the database
cluster so as to meet SLAs and to grow and shrink the size of the
cluster to match the demand. This is an on-line problem in that the
layout must be updated over time as conditions change by migrating
tenants without disrupting their service.
Bio
Dean Jacobs received his Ph.D. in Computer
Science from Cornell University. He then served on the faculty of the
Computer Science Department at the University of Southern California,
where he studied distributed systems, databases, and programming
languages. When the Internet began to get widespread commercial use,
Dr Jacobs joined the company WebLogic, which was later purchased by
BEA Systems. There, he developed the clustering and caching
infrastructure for WebLogic Application Server, for which he holds
twenty-nine patents. Dr Jacobs then joined
Salesforce.com<http://Salesforce.com/ [3]>, where he helped to
develop a highly-scalable, multi-tenant infrastructure for Software as
a Service. Currently, Dr Jacobs is a Chief Development Architect at
SAP, where he continues to work on bringing enterprise applications to
the Web.
Everybody is cordially welcome!
Please, forward this invitation to interested
colleagues.
Markus Weimer, Yahoo
Markus Weimer, Yahoo
Title:
Machine learning in ScalOps, a higher order cloud computing
language
Abstract:
In this talk, I will introduce ScalOps. ScalOps is a new
internal domain-specific language (DSL) for Big Data analytics that
targets machine learning and graph-based algorithms. It unifies the
so-far distinct DAG processing as found in e.g. PIG and the iterative
computation needs of machine learning in a single language and
runtime. It exposes a declarative language that is reminiscent to Pig
with iterative extensions. The scaloop block captures iteration and
packages it in the execution plan so that it can be optimized for
caching opportunities and handed off to the runtime. The Hyracks
runtime directly supports these iterations as recursive queries,
thereby avoiding the pitfalls of an outer driver loop. I will
highlight the expressiveness of ScalOps and its amenability to
optimizations using a real world, large scale machine learning example
drawn from Yahoo! Mail, one of the biggest email providers in the
world.
Bio:
Markus Weimer is a Scientist working in the Cloud Sciences
Team of Yahoo! Labs in Santa Clara, California. His research area is
Machine Learning with an emphasis on large scale algorithms,
applications and - most relevant to this talk - systems. In the past,
Markus has worked on collaborative filtering and ranking, abuse
prevention and detection models at Yahoo! and applications of Machine
Learning to the educational domain. Markus received his PhD from the
Technische Universität Darmstadt, Germany in 2009 under the joint
supervision of Alex Smola and Max Mühlhäuser and joined Yahoo! Labs
in the same year.
Everybody is cordially
welcome!
Please, forward this invitation to interested
colleagues.
Prof. Seamus Ross, iSchool, University of Toronto
Prof. Seamus Ross, iSchool, University of Toronto
Title:
Facilitating Digital Preservation through Risk Management
Abstract:
Everything about preserving digital objects is difficult from appraisal to description to management to access. The increasing diversity of types of digital materials curators must handle only exacerbates the obstacles. The challenges are further aggravated by the complexity of the organizational and social spaces in which these materials are used as well as by their awesome quantity. The longevity of digital objects depends upon active and persistent processes. The successful deployment of preservation and curation methods requires new approaches and in particular tools to measure and mitigate the risks associated with maintaining digital objects. This talk looks at the application of risk management methods and tools to support digital longevity and specifically the development and use of DRAMBORA as part of the work of the Digital Curation Centre in the UK and the EU funded initiative DigitalPreservationEurope.
Bio:
Seamus Ross is Dean and Professor, Faculty of Information, University of Toronto. Formerly, he was Professor of Humanities Informatics and Digital Curation and Founding Director of HATII (Humanities Advanced Technology and Information Institute) (http://www.hatii.arts.gla.ac.uk [4]) (1997-2009) at the University of Glasgow. He served as Associate Director of the Digital Curation Centre (2004-9) in the UK (http://www.dcc.ac.uk [5]), and was Principal Director of ERPANET (http://www.erpanet.org [6]) and DigitalPreservationEurope (DPE (http://www.digitalpreservationeurope.eu [7] and http://www.youtube.com/user/wepreserve [8]) and a co-principal investigator such projects as the DELOS Digital Libraries Network of Excellence (http://www.dpc.delos.info/ [9]) and Planets (http://www.planets-project.eu/ [10]). He recommends Digital Preservation and Nuclear Disaster: An Animation, http://www.youtube.com/watch?v=pbBa6Oam7-w [11] and „Digital Archaeology“ (1999), http://eprints.erpanet.org/47/01/rosgowrt.pdf [12] Dr Ross completed degrees at Vassar College, the University of Pennsylvania, and University of Oxford.
Everybody is cordially welcome!
Please, forward this invitation to interested colleagues.
re/dima_kolloquium/archiv/ws1112/parameter/de/font5/min
hilfe/