Inhalt des Dokuments
Termine Forschungskolloquium DIMA
Termin/Ort | Dozent/Them |
---|---|
Di. 17.03.2009 14.00 c.t. DIMA | Dr. Ulf Brefeld TU Berlin, FG Maschinelles Lernen "Machine Learning Approaches to Finding Relations in Natural Language" |
Mo. 16.02.2009 16.00 c.t. DIMA | Katrin Eisenreich SAP Research, Dresden, Germany "Concepts and Applications of Schema Matching" |
Mo. 02.02.2009 16.00 c.t. DIMA | Max Heimel IBM R&D, Boeblingen, Germany "A Bayesian Approach to Estimating the Selectivity of Conjunctive " |
Mo,
19.01.2009, 16 Uhr c.t. DIMA | Julia
Stoyanovich Columbia University, New York, U.S.A. "Efficient Network-Aware Search in Collaborative Tagging Sites" |
Mo., 08.12.2008 16 Uhr c.t. DIMA | Kevin Beyer [1] IBM Almaden Research Center, U.S.A. "Querying JSON Data on Hadoop using Jaql" |
Do., 04.12.2008 11 Uhr c.t. DIMA | Dr. Dean Jacobs Chief Development Architect, SAP AG, Germany "Databases for Software as a Service" |
Mo., 24.11.2008 16 Uhr c.t. DIMA | Dr. Martin Große-Rhode Fraunhofer-Institut für Software- und Systemtechnik ISST, Berlin, Abt. Verlässliche technische Systeme, Senior Scientist "Architekturzentriertes Variantenmanagement für eingebettete Systeme" -- Ergebnisse des Projekts: "Verteilte Entwicklung und Integration von Automotive-Produktlinien |
Do., 09.10.2008 14 Uhr c.t. DIMA | Dr. Gerald Weber Univ. of Auckland, New Zealand "Technology-Independent Modelling of Service Interaction" |
Dr. Ulf Brefeld, TU Berlin, FG Maschinelles Lernen
Abstract
Semantic processing of natural language is one of the oldest
problems in
machine learning and still far from being solved. By
now, low-level tasks
including part-of-speech tagging and named
entity recognition are well
understood while complex tasks such
as parsing, machine translation, and
sentiment prediction are
still lively subjects of ongoing research. The
talk focuses on
the identification of relations in sentences. Starting
from
classical (pipelined) approaches we'll derive state-of-the-art
techniques by addressing complex tasks in a single optimization
problem.
We'll also learn about two naturally arising problems:
Firstly, the trade-
off between performance and execution time
and secondly, the quest for
annotated data.
Bio
Since October 2007, Ulf is a postdoc
in the Machine Learning Group at
Technische Universität Berlin.
Prior to joining TU Berlin, he worked at
Max Planck Institute for
Computer Science in Saarbrücken and at Humboldt-
Universität zu
Berlin. Ulf received a Diploma in Computer Science in 2003
from
Technische Universität Berlin and a Ph.D. (Dr. rer. nat.) in 2008
from Humboldt-Universität zu Berlin.
Katrin Eisenreich, SAP Research, Dresden, Germany
Abstract
Schema
matching is the task of detecting corresponding elements between
schemas of autonomous, often differently structured data sources.
This is a vital step for enabling interoperability and data
integration in many areas, such as data migration, message mapping,
or the integration of web sources.
Today, schema matching is
mostly still performed manually or with only minimal automatic
support. Methods for (semi-) automatic matching apply different
algorithms to discover corresponding schema elements, exploiting
information from the schema characteristics, instance data, or some
sort of background knowledge.
This talk covers the basic concepts
of schema matching and introduces potential applications, as well as
(prototypical) matching tools developed at SAP Research.
Bio
Katrin Eisenreich joined SAP
Research in 2007 as a Research Associate. She has graduated from
Dresden University of Technology and holds a degree in Computer
Science. Her research focuses on developing a formal foundation for
business planning and forecasting applications.
Max Heimel, IBM Research, New York, USA
Abstract
Cost-based
optimizers in relational databases make use of data
statistics to
estimate intermediate result cardinalities. Those
cardinalities
are needed to estimate access plan costs in order to
choose the
cheapest plan for executing a query.
Since data statistics are
usually collected on single columns only,
the optimizer can not
directly estimate result cardinalities of
conjunctive predicates
over multiple attributes.
To avoid having to fall back to
assuming statistical independence,
most modern relational
database systems offer the possibility to
collect simple joint
data statistics over multiple attributes. A
widely used approach
is to collect the number of distinct value
combinations as a
joint statistic. This statistic can be used for a
uniformity
based estimate, which assumes each possible value
combination to
occur equally often. Although this leads to improved
estimates,
it is still inaccurate, since "real world" data is
unlikely
to be uniform.
In this talk, I will discuss a
different approach of estimating the
result cardinality of
conjunctive predicates. The proposed method
combines knowledge
from single-column histograms using a conditional
probability
based "uniform correlation"-approach.
Bio
Max Heimel graduated in 2008 with a
diploma degree in Applied Computer
Science at the Berufsakademie
Stuttgart. During his studies he spent
two internships at IBM
facilities in San Jose, USA working on topics
related to data
statistics usage and managment in the SQL optimizer of
Informix
Dynamic server. He wrote his diploma thesis "On Suggesting
Multi-Column Statistics In Informix Dynamic Server" during the
second
internship. Max is currently employed by IBM Germany
Research &
Development GmbH, working as a development
engineer on the optimizer
of Informix.
Julia Stoyanovich, Columbia University, New York, U.S.A.
Abstract
The
popularity of collaborative tagging sites presents a unique
opportunity to explore keyword search in a context where query
results
are determined by the opinion of a network of taggers
related to a
seeker. In this paper, we present the first
in-depth study of
network-aware search. We investigate efficient
top-k processing when the
score of an answer is computed as its
popularity among members of a
seeker's network. We argue that
obvious adaptations of top-k algorithms
are too space-intensive,
due to the dependence of scores on the seeker's
network. We
therefore develop algorithms based on maintaining score
upper-bounds. The global upper-bound approach maintains a single
score
upper-bound for every pair of item and tag, over the
entire collection
of users. The resulting bounds are very
coarse. We thus investigate
clustering seekers based on similar
behavior of their networks. We show
that finding the optimal
clustering of seekers is intractable, but we
provide heuristic
methods that give substantial time improvements. We
then give an
optimization that can benefit smaller populations of
seekers
based on clustering of taggers. Our results are supported by
extensive experiments on del.icio.us datasets.
Bio
Julia Stoyanovich is a PhD student
at Columbia University in New York,
where she works with
Professor Kenneth Ross. Julia received her B.S. in
Computer
Science and Mathematics from UMass Amherst in 1998, and went on
to work for two start-ups and one real company in New York City from
1998 to 2003. Julia's research concentrates on incorporating
the point
of view of the user into various aspects of data
management, with a
particular focus on real datasets and
practical applications.
Kevin Beyer IBM Almaden Research Center, U.S.A.
Abstract
We introduce
Jaql, a query language for the JSON data model. JSON
(JavaScript
Object Notation) has become a popular data format for many
Web-based applications because of its simplicity and modeling
flexibility. JSON makes it easy to model a wide spectrum of
data,
ranging from homogenous flat data to heterogeneous nested
data, and it
can do this in a language-independent format that
easily integrates
with existing programming languages. We believe
that these
characteristics make JSON an ideal data format for
many Hadoop
applications and databases in general. This talk will
describe the key
features of Jaql and show how it can be used to
process JSON data in
parallel using Hadoop's map/reduce
framework. The talk is intended
for a broad computer science
audience and includes background on
map/reduce and Hadoop.
Short Bio:
Kevin Beyer is a
Research Staff Member at the IBM Almaden Research
Center. His
research interests are in information management,
including query
languages, analytical processing, and indexing
techniques. He has
been designing and implementing Jaql, in one form
or another, for
the past several years. Previously, he led the design
and
implementation of the XML indexing support in DB2
pureXML.
Dr. Dean Jacobs, (Chief Development Architect, SAP AG, Germany)
Abstract
In the Software as
a Service (SaaS) model, a service provider owns and operates an
application that is accessed by many businesses over the Internet. A
key benefit of this model is that, by careful engineering, it is
possible to leverage economy of scale to reduce total cost of
ownership relative to on-premises solutions.
This talk will
describe basic architectures and best practices for implementing a
data management layer for SaaS. It will cover both first generation
systems, which are based on conventional databases and middleware, as
well as second generation systems, which are based on emerging cloud
computing platforms.
Short Bio
Dean
Jacobs received his Ph.D. in Computer Science from Cornell University
in 1985. He then served on the faculty of the Computer Science
Department at the University of Southern California, where he studied
distributed systems, databases, and programming languages.
When
the Internet began to get widespread commercial use, Dr. Jacobs joined
the company WebLogic, which was later purchased by BEA Systems.
There, he developed the clustering and caching infrastructure for
WebLogic Application Server, for which he holds thirteen patents.
Dr. Jacobs then joined Salesforce.com, where he helped to develop a
highly-scalable, multi-tenant infrastructure for Software as a
Service. Currently, Dr Jacobs is a Chief Development Architect at SAP,
where he is doing research on SaaS and supporting development of
Business ByDesign.
Dr. Martin Große-Rhode FHG/ISST/VTS
Abstract
Automotive-Systeme müssen eine Vielzahl funktionaler und
technischer
Varianten unterstützen. Im Entwicklungsprozess muss
frühzeitig
entschieden werden, ob unterschiedliche Varianten
durch eine generische
oder durch mehrere spezifische Anwendungen
implementiert werden sollen.
Im Projekt VEIA wurden
Beschreibungsmittel, Methoden und ein
prototypisches Werkzeug
für die Beantwortung dieser Frage erstellt.
Funktionale
Anforderungen und Softwarelösungen werden durch
Architekturmodelle dargestellt. Deren Elemente können variabel
sein,
d.h. die Beschreibungssprache erlaubt die Kennzeichnung
von Optionen,
Alternativen und Parametern. Durch die Verbindung
mit Modellen der
charakteristischen Eigenschaften der
Systemvarianten (Feature-Modellen)
können ausführbare und
AUTOSAR-konforme Modelle daraus generiert werden=20
Zur Bewertung
von Entwurfsalternativen für die unterschiedlichen
Softwarelösungen wurden Metriken definiert und implementiert, die
auf
die Architekturmodelle angewendet werden
können.
Dr. Gerald Weber (Univ. of Auckland, New Zealand)
Abstract
Systems based on a service-oriented architecture (SOA) can be
implemented with many different technologies, and in particular, they
can be implemented with a heterogeneous set of technologies. An
enterprise service bus (ESB) is a typicaloption for bridging the
technology boundaries. It is desirable to have technology-independent
models of the core services in the IT system. Based on the framework
of form-oriented analysis [1] we present here computation-independent
models (CIMs) and platform-independent models (PIMs) for service
oriented architectures. Our models have the following advantages:
Some of the CIMs are closely related to Petri net approaches; the
PIMs are expressed in the same formalism as the CIMs; a canonical PIM
is easily derived from a CIM; the semantics of the PIMs matches the
operation of a typical enterprise service bus architecture. Finally,
both CIM and PIM are defined as core semantic data models and can
therefore be created with most semantic data modeling tools.
[1] Dirk Draheim, Gerald Weber, Form-Oriented Analysis,
Springer, 2005.
Short Bio:
Gerald Weber is Senior Lecturer in Software Engineering andComputer
Science at the University in Auckland, New Zealand.
His research
interests include: software engineering for enterprisecomputing and
data intensive applications, as well as semantics of data models.
Other interests include human-computer interaction and theoretical
computer science.
He has received a Dr. rer. nat. at Freie
Universitaet Berlin.
He is the author of over 30 peer reviewed
publications. Gerald Weber has had an active role in several
international conferences, including proceedings chair of VLDB 2008
and program co-chair of EDOC 2008.
re/dima_kolloquium/archiv/ws0809/parameter/de/font0/max
hilfe/