Termine Forschungskolloquium DIMA

Termine WS0809
Di. 17.03.2009
14.00 c.t.
Dr. Ulf Brefeld
TU Berlin, FG Maschinelles Lernen
"Machine Learning Approaches to Finding Relations in Natural Language"
Mo. 16.02.2009
16.00 c.t.
Katrin Eisenreich
SAP Research, Dresden, Germany
"Concepts and Applications of Schema Matching"
Mo. 02.02.2009
16.00 c.t.
Max Heimel
IBM R&D, Boeblingen, Germany
"A Bayesian Approach to Estimating the Selectivity of Conjunctive "

Mo, 19.01.2009,
16 Uhr c.t.
Julia Stoyanovich
Columbia University, New York, U.S.A.
"Efficient Network-Aware Search in Collaborative Tagging Sites"
Mo., 08.12.2008
16 Uhr c.t.
Kevin Beyer
IBM Almaden Research Center, U.S.A.
"Querying JSON Data on Hadoop using Jaql"

Do., 04.12.2008
11 Uhr c.t.
Dr. Dean Jacobs
Chief Development Architect, SAP AG, Germany
"Databases for Software as a Service"
Mo., 24.11.2008
16 Uhr c.t.
Dr. Martin Große-Rhode
Fraunhofer-Institut für Software- und Systemtechnik ISST, Berlin, Abt. Verlässliche technische Systeme, Senior Scientist
"Architekturzentriertes Variantenmanagement für eingebettete Systeme" -- Ergebnisse des Projekts: "Verteilte Entwicklung und Integration von Automotive-Produktlinien
Do., 09.10.2008
14 Uhr c.t.
Dr. Gerald Weber
Univ. of Auckland, New Zealand
"Technology-Independent Modelling of Service Interaction"

Dr. Ulf Brefeld, TU Berlin, FG Maschinelles Lernen


Semantic processing of natural language is one of the oldest problems in
machine learning and still far from being solved. By now, low-level tasks
including part-of-speech tagging and named entity recognition are well
understood while complex tasks such as parsing, machine translation, and
sentiment prediction are still lively subjects of ongoing research. The
talk focuses on the identification of relations in sentences. Starting
from classical (pipelined) approaches we'll derive state-of-the-art
techniques by addressing complex tasks in a single optimization problem.
We'll also learn about two naturally arising problems: Firstly, the trade-
off between performance and execution time and secondly, the quest for
annotated data.


Since October 2007, Ulf is a postdoc in the Machine Learning Group at
Technische Universität Berlin. Prior to joining TU Berlin, he worked at
Max Planck Institute for Computer Science in Saarbrücken and at Humboldt-
Universität zu Berlin. Ulf received a Diploma in Computer Science in 2003
from Technische Universität Berlin and a Ph.D. (Dr. rer. nat.) in 2008
from Humboldt-Universität zu Berlin.

Katrin Eisenreich, SAP Research, Dresden, Germany


Schema matching is the task of detecting corresponding elements between schemas of autonomous, often differently structured data sources. This is a vital step for enabling interoperability and data integration in many areas, such as data migration, message mapping, or the integration of web sources.
Today, schema matching is mostly still performed manually or with only minimal automatic support. Methods for (semi-) automatic matching apply different algorithms to discover corresponding schema elements, exploiting information from the schema characteristics, instance data, or some sort of background knowledge.
This talk covers the basic concepts of schema matching and introduces potential applications, as well as (prototypical) matching tools developed at SAP Research.


Katrin Eisenreich joined SAP Research in 2007 as a Research Associate. She has graduated from Dresden University of Technology and holds a degree in Computer Science. Her research focuses on developing a formal foundation for business planning and forecasting applications.

Max Heimel, IBM Research, New York, USA


Cost-based optimizers in relational databases make use of data
statistics to estimate intermediate result cardinalities. Those
cardinalities are needed to estimate access plan costs in order to
choose the cheapest plan for executing a query.
Since data statistics are usually collected on single columns only,
the optimizer can not directly estimate result cardinalities of
conjunctive predicates over multiple attributes.
To avoid having to fall back to assuming statistical independence,
most modern relational database systems offer the possibility to
collect simple joint data statistics over multiple attributes. A
widely used approach is to collect the number of distinct value
combinations as a joint statistic. This statistic can be used for a
uniformity based estimate, which assumes each possible value
combination to occur equally often. Although this leads to improved
estimates, it is still inaccurate, since "real world" data is unlikely
to be uniform.
In this talk, I will discuss a different approach of estimating the
result cardinality of conjunctive predicates. The proposed method
combines knowledge from single-column histograms using a conditional
probability based "uniform correlation"-approach.


Max Heimel graduated in 2008 with a diploma degree in Applied Computer
Science at the Berufsakademie Stuttgart. During his studies he spent
two internships at IBM facilities in San Jose, USA working on topics
related to data statistics usage and managment in the SQL optimizer of
Informix Dynamic server. He wrote his diploma thesis "On Suggesting
Multi-Column Statistics In Informix Dynamic Server" during the second
internship. Max is currently employed by IBM Germany Research &
Development GmbH, working as a development engineer on the optimizer
of Informix.

Julia Stoyanovich, Columbia University, New York, U.S.A.


The popularity of collaborative tagging sites presents a unique
opportunity to explore keyword search in a context where query results
are determined by the opinion of a network of taggers related to a
seeker. In this paper, we present the first in-depth study of
network-aware search. We investigate efficient top-k processing when the
score of an answer is computed as its popularity among members of a
seeker's network. We argue that obvious adaptations of top-k algorithms
are too space-intensive, due to the dependence of scores on the seeker's
network. We therefore develop algorithms based on maintaining score
upper-bounds. The global upper-bound approach maintains a single score
upper-bound for every pair of item and tag, over the entire collection
of users. The resulting bounds are very coarse. We thus investigate
clustering seekers based on similar behavior of their networks. We show
that finding the optimal clustering of seekers is intractable, but we
provide heuristic methods that give substantial time improvements. We
then give an optimization that can benefit smaller populations of
seekers based on clustering of taggers. Our results are supported by
extensive experiments on del.icio.us datasets.


Julia Stoyanovich is a PhD student at Columbia University in New York,
where she works with Professor Kenneth Ross. Julia received her B.S. in
Computer Science and Mathematics from UMass Amherst in 1998, and went on
to work for two start-ups and one real company in New York City from
1998 to 2003. Julia's research concentrates on incorporating the point
of view of the user into various aspects of data management, with a
particular focus on real datasets and practical applications.

Kevin Beyer IBM Almaden Research Center, U.S.A.


We introduce Jaql, a query language for the JSON data model. JSON
(JavaScript Object Notation) has become a popular data format for many
Web-based applications because of its simplicity and modeling
flexibility. JSON makes it easy to model a wide spectrum of data,
ranging from homogenous flat data to heterogeneous nested data, and it
can do this in a language-independent format that easily integrates
with existing programming languages. We believe that these
characteristics make JSON an ideal data format for many Hadoop
applications and databases in general. This talk will describe the key
features of Jaql and show how it can be used to process JSON data in
parallel using Hadoop's map/reduce framework. The talk is intended
for a broad computer science audience and includes background on
map/reduce and Hadoop.

Short Bio:

Kevin Beyer is a Research Staff Member at the IBM Almaden Research
Center. His research interests are in information management,
including query languages, analytical processing, and indexing
techniques. He has been designing and implementing Jaql, in one form
or another, for the past several years. Previously, he led the design
and implementation of the XML indexing support in DB2 pureXML.

Dr. Dean Jacobs, (Chief Development Architect, SAP AG, Germany)

In the Software as a Service (SaaS) model, a service provider owns and operates an
application that is accessed by many businesses over the Internet. A key benefit of this model is that, by careful engineering, it is possible to leverage economy of scale to reduce total cost of ownership relative to on-premises solutions.
This talk will describe basic architectures and best practices for implementing a data management layer for SaaS. It will cover both first generation systems, which are based on conventional databases and middleware, as well as second generation systems, which are based on emerging cloud computing platforms.

Short Bio
Dean Jacobs received his Ph.D. in Computer Science from Cornell University in 1985. He then served on the faculty of the Computer Science Department at the University of Southern California, where he studied distributed systems, databases, and programming languages.
When the Internet began to get widespread commercial use, Dr. Jacobs joined the company WebLogic, which was later purchased by BEA Systems.
There, he developed the clustering and caching infrastructure for WebLogic Application Server, for which he holds thirteen patents.
Dr. Jacobs then joined Salesforce.com, where he helped to develop a highly-scalable, multi-tenant infrastructure for Software as a Service. Currently, Dr Jacobs is a Chief Development Architect at SAP, where he is doing research on SaaS and supporting development of Business ByDesign.

Dr. Martin Große-Rhode FHG/ISST/VTS


Automotive-Systeme müssen eine Vielzahl funktionaler und technischer
Varianten unterstützen. Im Entwicklungsprozess muss frühzeitig
entschieden werden, ob unterschiedliche Varianten durch eine generische
oder durch mehrere spezifische Anwendungen implementiert werden sollen.
Im Projekt VEIA wurden Beschreibungsmittel, Methoden und ein
prototypisches Werkzeug für die Beantwortung dieser Frage erstellt.

Funktionale Anforderungen und Softwarelösungen werden durch
Architekturmodelle dargestellt. Deren Elemente können variabel sein,
d.h. die Beschreibungssprache erlaubt die Kennzeichnung von Optionen,
Alternativen und Parametern. Durch die Verbindung mit Modellen der
charakteristischen Eigenschaften der Systemvarianten (Feature-Modellen)
können ausführbare und AUTOSAR-konforme Modelle daraus generiert werden=20
Zur Bewertung von Entwurfsalternativen für die unterschiedlichen
Softwarelösungen wurden Metriken definiert und implementiert, die auf
die Architekturmodelle angewendet werden können.

Dr. Gerald Weber (Univ. of Auckland, New Zealand)

Systems based on a service-oriented architecture (SOA) can be implemented with many different technologies, and in particular, they can be implemented with a heterogeneous set of technologies. An enterprise service bus (ESB) is a typicaloption for bridging the technology boundaries. It is desirable to have technology-independent models of the core services in the IT system. Based on the framework of form-oriented analysis [1] we present here computation-independent models (CIMs) and platform-independent models (PIMs) for service oriented architectures. Our models have the following advantages:
Some of the CIMs are closely related to Petri net approaches; the PIMs are expressed in the same formalism as the CIMs; a canonical PIM is easily derived from a CIM; the semantics of the PIMs matches the operation of a typical enterprise service bus architecture. Finally, both CIM and PIM are defined as core semantic data models and can therefore be created with most semantic data modeling tools.

[1] Dirk Draheim, Gerald Weber, Form-Oriented Analysis,
Springer, 2005.

Short Bio:
Gerald Weber is Senior Lecturer in Software Engineering andComputer Science at the University in Auckland, New Zealand.
His research interests include: software engineering for enterprisecomputing and data intensive applications, as well as semantics of data models. Other interests include human-computer interaction and theoretical computer science.
He has received a Dr. rer. nat. at Freie Universitaet Berlin.
He is the author of over 30 peer reviewed publications. Gerald Weber has had an active role in several international conferences, including proceedings chair of VLDB 2008 and program co-chair of EDOC 2008.



