direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Talks DIMA Research Seminar

Talks WS2011/2012
Talk/Location
Lecturer/Subject
14.05.2012
4 p.m.
DIMA EN 719
Vasiliki Kalavri, KTH
19.03.2012
4 p.m.
DIMA EN 719
Katrin Eisenreich, SAP
"Correlation Support for Risk Evaluation in Databases"
20.02.2012
4. p.m.
DIMA EN719
Prof. Seamus Ross, iSchool, University of Toronto
Facilitating Digital Preservation through Risk Management
13.02.2012
5.p.m.
DIMA EN 719
Martin Grund, HPI
HYRISE
"A Hybrid In-Memory Storage Engine"
21.11.2011
4 p.m.
DIMA EN 719
Philippe Cudre-Mauroux, WS11/12University of Fribourg, Switzerland
"dipLODocus[RDF]--Short and Long-Tail RDF Analytics for Massive Webs of Data"
14.11.2011
4 p.m.
DIMA EN 719
Daniel Boesswetter
"A Hybrid Approach to Physical Data Placement in Relational Database Systems"
01.11.2011
2 p.m.
DIMA EN 719
Dean Jacobs, SAP
"DaaS for SaaS"
01.11.2011
11 a.m.
DIMA EN719
Markus Weimer, Yahoo
"Machine learning in ScalOps, a higher order cloud computing language"

Martin Grund, HPI

Martin Grund, HPI


Title: HYRISE - A Hybrid In-Memory Storage Engine

Abstract:
In this talk I will present my research findings in the area of enterprise application-aware database systems. Based on the current evolution in computer hardware architectures, main memory plays an increasing important role for modern database systems. To fully leverage the performance properties of main memory, database systems need to be aware of the properties of enterprise applications. Therefore I will show some analysis of enterprise applications and their data usage and how this can affect the database layer.
Current database systems lack features that optimize the physical storage engine based on the workload of the applications (without replication) due to the high cost of rearranging the data on disk. Using our research prototype - HYRISE - I present a system that depending on the input workload allows selecting an optimal vertical partitioning for the tables of the application. To validate the approach we used an adapted application benchmark.
Since modern enterprise applications will evolve faster then the previous generations it becomes more important to track changes of the workload and provide a high-performance algorithm to adapt to changes. As an outcome we will present an adapted version of our initial algorithm that allows incremental calculating of the optimal layout to reduce the overall search space.
The talk concludes with a summary of the specific contributions and an outlook for future research in the area of hybrid main memory database systems.


Bio:


Martin Grund received his Bachelor and Master's Degree from the Hasso Plattner Institute in Potsdam, Germany. Currently he is finishing his PhD at the chair of Prof. Plattner in the area of hybrid main memory databases.

Everybody is cordially welcome!
Please, forward this invitation to interested colleagues.

Katrin Eisenreich, SAP

Katrin Eisenreich, SAP

Title:

Correlation Support for Risk Evaluation in Databases

Abstract:

Investigating potential dependencies in data and their effect on future business developments can help experts to prevent misestimations of risks and chances. This makes correlation a highly important factor in risk analysis tasks. Previous research on correlation in uncertain data management addressed foremost the handling of dependencies between discrete rather than continuous distributions. Also, none of the existing approaches provides a clear method for extracting correlation structures from data and introducing assumptions about correlation to independently represented data.
To enable risk analysis under correlation assumptions, we use an approximation technique based on copula functions. This technique enables analysts to introduce arbitrary correlation structures between arbitrary distributions and calculate relevant measures over thus correlated data. The correlation information can either be extracted at runtime from historic data or be accessed from a parametrically precomputed structure. We discuss the construction, application and querying of approximate correlation representations for different analysis tasks. Two use cases serve to motivate and exemplify the application of the presented operators. Our experiments demonstrate the efficiency and accuracy of the proposed approaches, and point out several possibilities for optimization.
The presented paper has been accepted as a full paper at ICDE 2012. In the talk, I would like to prepare for the conference presentation. I am therefore looking forward to your comments and suggestions, since they will help me improve the talk.
“Correlation Support for Risk Evaluation in Databases”. Katrin Eisenreich, Jochen Adamek, Philipp Rösch, Gregor Hackenbroich and Volker Markl. ICDE 2012. To be published April 2012.

Bio:


Katrin Eisenreich graduated from the Technical University of Dresden in 2007, after completing her major thesis in the field of schema and ontology matching in cooperation with SAP Research. Since 2008, Katrin has been working as a Research Associate in the Business Intelligence practice of SAP Research. In her PhD research, she addresses the efficient representation and processing of uncertain data for scenario analysis on the database, with special focus on the handling of correlations and the efficient re-computation of analysis queries.

Philippe Cudre-Mauroux, University of Fribourg

Philippe Cudre-Mauroux, University of Fribourg, Switzerland


Title:

dipLODocus[RDF]--Short and Long-Tail RDF Analytics for Massive Webs of Data

Abstract:

The proliferation of semantic data on the Web requires RDF database systems to constantly improve their scalability and transactional efficiency. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. This paper introduces a novel database back-end for RDF data called dipLODocus[RDF], which supports both transactional and analytical queries efficiently. dipLODocus[RDF] takes advantage of a new hybrid storage model for RDF data based on recurring graph patterns. In this talk, I will describe the general architecture of the system and compare its performance to state-of-the-art solutions for both transactional and analytic workloads.


Bio:

Philippe Cudre-Mauroux is a Swiss-NSF associate professor at the University of Fribourg in Switzerland. Previously, he was a postdoctoral associate working in the Database Systems group at MIT. He received his Ph.D. from the Swiss Federal Institute of Technology EPFL, where he won both the Doctorate Award and the EPFL Press Mention in 2007. Before joining the University of Fribourg, he worked on distributed information management systems for HP, IBM T.J. Watson Research, and Microsoft Research Asia. His research interests are in exascale data management infrastructures for non-relational data. Webpage: http://diuf.unifr.ch/xi

 

Everybody is cordially welcome! Please, forward this invitation to interested colleagues.

Daniel Boesswetter, FU Berlin

Daniel Boesswetter, FU Berlin

Title:

"A Hybrid Approach to Physical Data Placement in Relational Database Systems"

Abstract:

In recent years, relational database technology has undergone a
diversification process: while major database vendors have refined a
single architecture for all data processing purposes over four decades, it
has now become evident in research and practice, that this architecture no
longer fits current hardware or nowadays' requirements. Column-oriented
physical data models and execution strategies have gained enormous
interest in research and industry. Column-orientation is known to
increase the execution speed of analytical relational queries (OLAP) that
require few attributes from many tuples instead of all attributes of few
tuples, as it is typical for transactional workloads (OLTP). Moreover,
it supports the compression of data which leads to higher throughput
for column scans which are common in data warehouse execution plans. On
the other hand, column-orientation and compression are counterproductive
for transactional processing, because each update potentially leads to
many writes or even worse, to an expensive reorganization of compressed
data. This results in the typical two-tier approach with one (or more)
row-oriented, possibly main-memory based system being responsible
for transaction processing and a separate column-oriented system for
the analytics. Data is transferred from the former to the latter in
regular intervals by an extract-transform-load (ETL) process.  While
having two independent systems for OLAP and OLTP may have advantages,
it is not always desirable. Real-time businesses demand analytical
queries on up-to-date data so that a nightly ETL-process might be
insufficient. High-volume updates as found in telecommunication systems
might even be too large to be imported into the data warehouse in time. It
is thus an open research question whether a reunification of relational
systems into a single architecture for both requirements is possible,
such that data can be analyzed directly at it source.  This talk deals
with hybrid data placement strategies which allow both workloads under
trade-offs to each other.


Bio:

Daniel Boesswetter received his Diploma in Computer Science from the Technische Universitaet Muenchen in 2004. After professional experience at PEPPERMIND in Munich (1998-2003) and at Jamba! in Berlin (2004-2005), he has been a member of the Database and Information Systems Group of the Freie Universitaet Berlin starting in 2007 where he is currently finishing his dissertation. Daniel is author of several scientific and non-scientific articles in the field of database technology. Everybody is cordially welcome! Please, forward this invitation to interested colleagues.

Dean Jacobs, SAP

Dean Jacobs, SAP

Title: "DaaS for SaaS"

Abstract
This talk describes our current work designing a Database as a Service layer based on SAP’s in-memory database to support Software as a Service. This layer will offer a single-tenant-process model, where each process contains a single tenant and there are multiple processes on each machine, as well as a multi-tenant-process model, where each process contains multiple tenants. It will support heterogeneous multi-tenancy, which spans different applications, as well as homogeneous multi-tenancy, which spans multiple instances of the same application. The primary functions of this layer are to distribute tenants across the database cluster so as to meet SLAs and to grow and shrink the size of the cluster to match the demand. This is an on-line problem in that the layout must be updated over time as conditions change by migrating tenants without disrupting their service.

Bio
Dean Jacobs received his Ph.D. in Computer Science from Cornell University. He then served on the faculty of the Computer Science Department at the University of Southern California, where he studied distributed systems, databases, and programming languages. When the Internet began to get widespread commercial use, Dr Jacobs joined the company WebLogic, which was later purchased by BEA Systems. There, he developed the clustering and caching infrastructure for WebLogic Application Server, for which he holds twenty-nine patents. Dr Jacobs then joined Salesforce.com<http://Salesforce.com/>, where he helped to develop a highly-scalable, multi-tenant infrastructure for Software as a Service. Currently, Dr Jacobs is a Chief Development Architect at SAP, where he continues to work on bringing enterprise applications to the Web.


Everybody is cordially welcome!
Please, forward this invitation to interested colleagues.

Markus Weimer, Yahoo

Markus Weimer, Yahoo

Title:


Machine learning in ScalOps, a higher order cloud computing language


Abstract:


In this talk, I will introduce ScalOps. ScalOps is a new internal domain-specific language (DSL) for Big Data analytics that targets machine learning and graph-based algorithms. It unifies the so-far distinct DAG processing as found in e.g. PIG and the iterative computation needs of machine learning in a single language and runtime. It exposes a declarative language that is reminiscent to Pig with iterative extensions. The scaloop block captures iteration and packages it in the execution plan so that it can be optimized for caching opportunities and handed off to the runtime. The Hyracks runtime directly supports these iterations as recursive queries, thereby avoiding the pitfalls of an outer driver loop. I will highlight the expressiveness of ScalOps and its amenability to optimizations using a real world, large scale machine learning example drawn from Yahoo! Mail, one of the biggest email providers in the world.

Bio:


Markus Weimer is a Scientist working in the Cloud Sciences Team of Yahoo! Labs in Santa Clara, California. His research area is Machine Learning with an emphasis on large scale algorithms, applications and - most relevant to this talk - systems. In the past, Markus has worked on collaborative filtering and ranking, abuse prevention and detection models at Yahoo! and applications of Machine Learning to the educational domain. Markus received his PhD from the Technische Universität Darmstadt, Germany in 2009 under the joint supervision of Alex Smola and Max Mühlhäuser and joined Yahoo! Labs in the same year.


Everybody is cordially welcome!
Please, forward this invitation to interested colleagues.

Prof. Seamus Ross, iSchool, University of Toronto

Prof. Seamus Ross, iSchool, University of Toronto

Title:

Facilitating Digital Preservation through Risk Management

Abstract:

Everything about preserving digital objects is difficult from appraisal to description to management to access.  The increasing diversity of types of digital materials curators must handle only exacerbates the obstacles. The challenges are further aggravated by the complexity of the organizational and social spaces in which these materials are used as well as by their awesome quantity. The longevity of digital objects depends upon active and persistent processes. The successful deployment of preservation and curation methods requires new approaches and in particular tools to measure and mitigate the risks associated with maintaining digital objects. This talk looks at the application of risk management methods and tools to support digital longevity and specifically the development and use of DRAMBORA as part of the work of the Digital Curation Centre in the UK and the EU funded initiative DigitalPreservationEurope.

Bio:

Seamus Ross is Dean and Professor, Faculty of Information, University of Toronto. Formerly, he was Professor of Humanities Informatics and Digital Curation and Founding Director of HATII (Humanities Advanced Technology and Information Institute) (http://www.hatii.arts.gla.ac.uk) (1997-2009) at the University of Glasgow. He served as Associate Director of the Digital Curation Centre (2004-9) in the UK (http://www.dcc.ac.uk), and was Principal Director of ERPANET (http://www.erpanet.org) and DigitalPreservationEurope (DPE (http://www.digitalpreservationeurope.eu and http://www.youtube.com/user/wepreserve) and a co-principal investigator such projects as the DELOS Digital Libraries Network of Excellence (http://www.dpc.delos.info/) and Planets (http://www.planets-project.eu/). He recommends Digital Preservation and Nuclear Disaster: An Animation, http://www.youtube.com/watch?v=pbBa6Oam7-w and „Digital Archaeology“ (1999), http://eprints.erpanet.org/47/01/rosgowrt.pdf  Dr Ross completed degrees at Vassar College, the University of Pennsylvania, and University of Oxford.

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions