direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Talks DIMA Research Seminar

Talks SS12
Talk/Location
Lecturer/Subject
23.04.2012
2 p.m.
DIMA EN 719
Marcos Vaz Salles, University of Copenhagen "Efficient and Programmable Behavioral Simulations through Database Techniques"
07.05.2012
4 p.m.
DIMA EN 719
Adam, Stanski
14.05.2012
4 p.m.
DIMA EN 719
Vasiliki Kalavri, KTH
If Pigs Could Fly: Integrating Apache Pig and Stratosphere
21.05.2012
4 p.m.
DIMA EN 719
Michael Haupt, Oracle
Maxine: eine JVM in Java
04.06.2012
4. p.m.
DIMA EN 719
Uwe Judel. SAP
"Interactive Visualization of High-Velocity Event Streams"
11.06.2012
4 p.m.
DIMA EN 719
Joachim Schmidt, TU HH
"Sprachen für die Datenbankprogrammierung, Abstraktionen und Applikationen"


14.06.2012
4 p.m.
DIMA EN
719
Dieter Gawlick, Oracle
"Deriving, Managing, and Using Qualitative Information"
29.06.2012
11 a.m.
DIMA EN 719
Prof. Neumann, TU München
"Query Compilation and Execution in HyPer"
09.07.2012
4 p.m
DIMA EN 719
Jannik Strötgen, Uni Heidelberg
"Event-centric Information Extraction and Retrieval to Explore Document Collections"
10.07.2012
3 p.m.
TEL 20 Aud. 1
Talk Prof. Pietro Michiardi    (EURECOM)
"Big Data Analytics in Practice"
TEL (20. Etage) Auditorium 1
12.07.2012
2 p.m.
DIMA EN 719
Dionysios Logothetis, Telefonica Research Barcelona
"Architectures for large-scale continuous data management"
03.09.2012
4 p.m.
DIMA EN 719
Sebastian Maneth
"An Overview of SXSI: Fast XPath Search over Compressed Indexes"
10.09.2012
4.00 p.m.
DIMA EN 719
Muhammad Asif Naeem, School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland
"
Stream-based Joins with Limited Resource Consumption "
26.09.2012
11.00 a.m.
DIMA EN 719


Prof. Jeffrey Naughton, University of Wisconsin-Madison
"Two-Phase Entity Resolution"

Marcos Vaz Salles, DIKU

Marcos Vaz Salles, DIKU

Title:

Efficient and Programmable Behavioral Simulations through Database Techniques

Abstract:

In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand important real-world phenomena. These phenomena emerge as the result of a myriad interactions among large numbers of interdependent agents in a complex system, such as a transportation network or an ecological system. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and automatically scale in parallel environments. In this talk, I will show how database techniques can solve this dilemma, by offering simulation developers a programmable environment that automatically provides for scalability. I will present the design of BRACE, the Big Red Agent-based Computation Engine.  BRACE at the same time offers a high-level scripting language for simulation developers and scales scripts written in this language by modeling them as iterated spatial joins. I will discuss recent and ongoing work on techniques to execute these iterated joins efficiently in the cloud, with the goal of turning BRACE into an important platform for new mid-range scientific computing workloads.

Bio:

Marcos Vaz Salles is an assistant professor at the Department of Computer Science (DIKU) of the University of Copenhagen. His research targets building novel data-driven systems that bring classic database benefits, such as scalability and ease of programming, to new domains. In work that started during his postdoc at Cornell University, Marcos is investigating how to bring data management techniques to computer games and behavioral simulations. During his PhD in the Systems Group at ETH Zurich, he investigated hybrid search and data integration architectures for personal dataspace management in the iMeMex project. Previously, Marcos obtained his MSc from PUC-Rio, Brazil, and his BSc from UNICAMP, Brazil.

Please, forward this invitation to interested colleagues.

Michael Haupt, Oracle

Michael Haupt, Oracle

Title:

Maxine: eine JVM in Java

Abstract:

 

Kann man eine Java-VM in Java implementieren? Warum macht man das? Welche Vorteile kann man daraus ziehen?

Maxine ist eine fast ausschließlich in Java geschriebene JVM, die Anwendungen wie Glassfish und Eclipse ausführen kann. Das Projekt bei Oracle Labs entwickelt zum Beispiel Technologien für just-in-time-Compiler und garbage collection, aber auch Werkzeuge zum Debugging so komplexer Systeme wie einer JVM.

Der Vortrag stellt Maxine im Überblick vor, geht bei einigen Aspekten der VM ins Detail und demonstriert einen neuartigen JIT-Compiler sowie den Maxine Inspector, der es erlaubt, die VM bei der Ausführung von Anwendungen zu beobachten.

Bio:

Michael Haupt ist Mitglied des Maxine-Teams bei Oracle Labs. Seine Forschung befasst sich mit der Verbesserung der Modularität virtueller Maschinen und der Implementierung von Programmiersprachen in Laufzeitumgebungen, wobei er besonderes Augenmerk auf die Optimierung spezifischer Sprachmechanismen legt. Vor seiner Tätigkeit bei Oracle Labs war er Postdoc am Hasso-Plattner-Institut in Potsdam und Doktorand an der Technischen Universität Darmstadt.

Please, forward this invitation to interested colleagues.

Vasiliki Kalavri, KTH, Sweden

Vasiliki Kalavri, KTH, Sweden

Title:

 "If Pigs Could Fly: Integrating Apache Pig and Stratosphere"

Abstract:

Writing efficient applications in MapReduce or PACT requires strong programming skills and in-depth understanding of the systems’ architectures. In order to make the power of these systems accessible to non-experts, save development time and make application code easier to understand and maintain, several high-level languages have been developed. 

One of the most popular high-level dataflow systems is Apache Pig. Pig overcomes Hadoop’s one-input and two-stage dataflow limitations, allowing the developer to write SQL-like scripts. However, Hadoop's limitations are still present in the backend system and add a notable overhead to the execution time. Pig is currently implemented on top of Hadoop, however it has been designed to be modular and independent of the execution engine.

For my thesis project, I am currently working on integrating Pig and Stratosphere. I believe that Stratosphere has desirable properties that will significantly improve Pig's performance. In this talk, I will present the goal, motivation and expectations of my project. I will give an introduction to the Pig system internals, i.e. the data model, the compilers, and the optimizers. I will also focus on the integration methodology, integration alternatives, challenges faced and design decisions. Finally, I will briefly present the evaluation strategy planned.

Bio:

Vasiliki Kalavri is a 2nd-year student of the European Master in Distributed Computing (EMDC), at KTH, the Royal Institute of Technology, Sweden. 

She is currently working on her thesis project at the Swedish Institute of Computer Science (SICS), under the supervision of Per Brand.

While in KTH, she has been providing teaching assistance for graduate and undergraduate courses, as well as courses on Distributed Systems and Functional Programming for Ericsson engineers, under the supervision of Associate Professor J. Montelius. 

She spent the 1st year of her Master studies at UPC, Barcelona, and holds a degree in Electrical and Computer Engineering from the National Technical University of Athens. She has also spent one year in industry, working as a business web applications developer.

Please, forward this invitation to interested colleagues.

Dieter Gawlick, Oracle

Dieter Gawlick, Oracle

Title: Deriving, Managing, and Using Qualitative Information

Abstract: The amount and complexity of data, knowledge, and processes is accelerating and an ever faster pace and has grown way beyond the capacity of the human brain. To deal with the amount of data and the pressure for fast reaction issue, IT technology has to be used to extract important information and make it available in a way that is easy to comprehend by the human brain. The talk discusses findings based on a prototype to support doctors in an Intensive Care Unit. One of the major findings was that the vast amount of measured and observed quantitative data (facts) had to be transformed into qualitative data (information) to help the medical personal to understand the situation of a patient in a shorter time and with far less effort. A side effect of this transformation was that queries and requests for notification could be formulated in a way doctors communicate with each other. The talk suggest that we need a new class of data types - we call them smart data types - to provide easy and fast insight into facts and to significantly extend the ability to replace procedural code by declarative statements.

Bio: Dieter is architect in Oracle’s database division; he has developed key concepts for high-end OLTP, high availability, storage management, messaging, workflow, and information dissemination.  He is currently focusing on the evolution of data base technology towards a synergistic view between data management, knowledge management, and process management to enable the development of evolutionary applications.  Dieter has written numerous papers and served in numerous program committees. Before joining Oracle, Dieter has worked at IBM, Ahmdahl (Fujitsu), and Digital (HP).

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

Uwe Jugel, SAP Dresden, Germany

Uwe Jugel, SAP Dresden

Title:

 "Interactive Visualization of High-Velocity Event Streams"

Abstract:

 

Today, complex event processing systems enable real-time analysis of high-velocity event streams. Considering their high stream processing efficiency, they provide a promising basis for computing real-time visualization of the streaming data. However, when building real-time visualizations on top of streaming systems, three major challenges arise:

1. Interactive Querying - How to make sure the user gets instant responses to drill-down interactions?

2. Efficient Visualization Processing - How to create the visualization artifacts in the backend/front-end?

3. Efficient Event Stream Distribution - How to multiplex real-time data in mass user scenarios?

In this talk, I will introduce these three problems and provide an outlook how to solve them in the course of my PhD thesis.

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

 

Prof. em. Dr. Joachim W. Schmidt, TU HH

Prof. em. Dr. Joachim W. Schmidt
Institut für Software Systeme, STS
Technische Universität Hamburg, TUHH    

Title

Sprachen für die Datenbankprogrammierung Abstraktionen und Applikationen

 

Abstract

Sprachlich anspruchsvollere Ansätze für die Programmierung datenintensiver Anwendungen zeichnen sich vor allem durch leistungsfähige und adäquate Abstraktionsmechanismen aus (funktionale Abstraktion, Modularisierung, monomorphe oder polymorphe Typisierung, ...). Freie Kombinierbarkeit der Basismechanismen sowie Orthogonalität der Persistenz sind weitere Entwurfsziele. Derartige Sprachen erlauben nicht nur eine elegantere und effektivere Anwendungsprogrammierung, sie eröffnen auch einen systematischeren Zugang zu Optimierungsaufgaben, etwa in den Bereichen Anfragen und Parallelität. Auch dazu möchte dieser Vortrag, der ansonsten eine längere Entwicklungslinie nachzeichnet und bewertet, einen Beitrag leisten.

 

CV

see Attachment (DOC, 137,0 KB)

 

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

Prof. Dr. Thomas Neumann, TU München

Prof. Dr. Thomas Neumann, TU München

Title: Query Compilation and Execution in HyPer

Abstract: The traditional way to execute queries is the so called iterator model, where algebraic operators produce tuple streams on demand. While being very flexible, the iterator model is not well suited for main-memory database systems, as it is very cache-unfriendly and causes a significant overhead. Data-centric execution models that blur operator boundaries to maximize locality reach a significant better performance on modern CPUs. The HyPer systems compiles both regular SQL queries and more complex pre-canned transactions into such execution models using LLVM for machine-code generation.

Bio: Thomas Neumann studied Computer Science from 1997 until 2001 at the Univ. Mannheim, Germany. He received his doctoral degree in 2005 from the same university with a thesis on query optimization. Thereafter, he worked as a senior researcher at the Max Planck Institute for Informatics (MPI) in Saarbrücken. In 2009 he was a visiting researcher in the SQL Server group of Microsoft, before being appointed as a Professor at the Techn. Univ. München (TUM) in 2010.

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

Best regards,
Alexander Borusan

Prof. Pietro Michiardi (EURECOM)

Prof. Pietro Michiardi    (EURECOM)

Title: "Big Data Analytics in Practice"

Abstract:

In this talk, we will first overview the MapReduce system and programming model (and its open source incarnation, Hadoop). We will then discuss the EC-Funded BigFoot project, with the goal of illustrating the main objectives and stimulating discussions for possible collaboration between our groups. Finally, I will spend some time discussing the current research activity in my group, that cover systems and tools for large-scale network data processing, and size-based scheduling protocols for parallel systems a-la MapReduce.

Bio:

Pietro Michiardi received his M.S. in Computer Science from EURECOM and his M.S. in Electrical Engineering from Politecnico di Torino. Pietro received his Ph.D. in Computer Science from Telecom ParisTech (former ENST, Paris). Today, Pietro is an Assistant Professor of Computer Science at EURECOM, where he leads the Distributed System Group, which blends theory and system research focusing on large-scale distributed systems (including data processing and data storage), and scalable algorithm design to mine massive amounts of data. Additional research interests are on system, algorithmic, and performance evaluation aspects of computer networks and distributed systems.

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues

 

Best regards,
Alexander Borusan

Jannik Strötgen, Uni Heidelberg

Jannik Strötgen, Uni Heidelberg

 Title: "Event-centric Information Extraction and Retrieval to Explore Document Collections"

 

Abstract:
------------
In this talk, we present our work on event-centric information extraction and retrieval with an event being simply defined as a combination of spatial and temporal information. For this, we first introduce our multilingual, cross-domain temporal tagger HeidelTime and describe some challenges occurring when extracting and normalizing temporal expressions from text documents of different domains. Then, we present our system to perform event-centric search and exploration in document collections, which allows, for example, to specify spatial and temporal query constraints and to retrieve as search result sequences of relevant events extracted from different documents instead of a hit list of documents containing such events. Finally, we present our model for calculating event-centric document similarities. In contrast to standard term-based similarity models, our approach is directly based on the semantics of the events and such term- and language-independent.

Bio:
-----
Jannik Strötgen studied Computational Linguistics and Economics at the Ruprecht-Karls-University Heidelberg and received his Magister Artium in June 2009. He wrote his Magister thesis on "Building-up and Evaluation of a UIMA-based Text Mining Pipeline for Biomedical Literature" at the Fraunhofer Institute for Algorithms and Scientific Computing where he was a student worker between 2004 and 2009.
Since 2009, he is a PhD student and part of the Database Systems Research Group lead by Prof. Dr. Michael Gertz at the Institute of Computer Science in Heidelberg. While he still works on Natural Language Processing and Information Extraction, his research focus switched from the biomedical domain to spatio-temporal information extraction and retrieval.

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

Best regards,
Alexander Borusan

Dionysios Logothetis, Telefonica Research Barcelona

Dionysios Logothetis, Telefonica Research Barcelona

Title : "Architectures for large-scale continuous data management"

Abstract:
------------

The ability to do rich analytics on massive sets of unstructured data drives the operation of many organizations today. These ³big data² analytics have given rise to a new class of data-intensive computing systems, like MapReduce, that can scale to very large data simply by employing more compute power. While these systems have been very successful, it is becoming apparent that scalability alone is not enough.

Many analytics today are update-driven, and this brute-force approach is inefficient when trying to keep analytics up-to-date as data change continuously.

In the first part of the talk, I will present a new approach for programming analytics that takes the continuous nature of data into consideration. A fundamental requirement for efficient processing of continuous data is the ability to incrementally update the analytics by maintaining computation state. I will argue that state should be a first-class abstraction and present Continuous Bulk Processing (CBP), a model and architecture that integrates data-parallelism for scalability with state for efficient update-driven analytics. The model lends itself to several analytics, like incremental algorithms and iterative analysis.

Through real-world applications, I will show how the integration of state in the programming model affords several optimizations in the underlying system, reducing processing time and resource usage relative to current practice.

While integrating state in the programming model allows efficient incremental programs, it may be challenging to design incremental algorithms for complex analytics, like iterative graph mining and machine learning. In the second part, I will talk about ongoing work on a system that can incrementally compute this class of analytics in a manner that is transparent to the user.

 

Bio:
-----

 

I am an Associate Researcher with the Telefonica Research lab in Barcelona, Spain. I am primarily interested in building systems for large-scale data mining. My broader research interests lie in the areas of data management, cloud computing and distributed systems. I received my PhD in Computer Science from the University of California, San Diego and Diploma in Computer Science & Engineering from the National Technical University of Athens, Greece.

Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

Best regards,
Alexander Borusan

Dr. Sebastian Maneth, University of Leipzig

Dr. Sebastian Maneth, University of Leipzig

Title : "An Overview of SXSI: Fast XPath Search over Compressed Indexe"

Abstract:
------------

We consider "XPath Search" queries. Such queries combine text search
with forward Core XPath and are very useful for querying documents such as
Medline, dblp, or biological data. The SXSI system consists of a query
engine based on tree automata, and of state-of-the-art text and tree
indexes. It runs in-memory, and is the fastest existing system for the
supported queries. The query engine uses novel "whole query optimizations"
to cleverly choose between the indexes. SXSI is a modular system which
allows to seamlessly replace its indexes. This is demonstrated through
experiments with alternative text indexes, such as a word-based index
for natural language search and a specialized index for bio sequence search.

Bio:


Dr. Maneth received his PhD from the University of Leiden in 2003.
He then spent two years at EPFL in Martin Odersky's group on
programming languages. Since 2006 he has been a senior researcher
at NICTA, Australia's national research center for computer science,
and a conjoint associate professor at the University of New South
Wales in Sydney. He is currently a Mercator guest professor at the
University of Leipzig.

Everybody is cordially welcome! 

Please, forward this invitation to interested colleagues.

Best regards,
Alexander Borusan

Muhammad Asif Naeem, School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland

Muhammad Asif Naeem, School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland

 

Title

"Stream-based Joins with Limited Resource Consumption"

Abstract

Many stream-based applications have plenty of resources available to them, but there are also applications where resource consumption must be limited. For one important class of stream-based joins, where a stream is joined with a non-stream master data set, the algorithm called MESHJOIN was proposed. MESHJOIN uses limited memory and is a candidate for a resource-aware system setup. The problem that is considered in our research is that MESHJOIN is not very selective. In particular, MESHJOIN does not consider the typical characteristics like skew in stream data and therefore the performance of the algorithm is always inversely proportional to the size of the master data table. Consequently, the resource consumption is in some scenarios suboptimal. As a solution we propose a set of algorithms while each algorithm performs better than MESHJOIN in defined settings. For these algorithms cost models have been developed for tuning the algorithms and validation of our implementation. In order to quantify the performance differences, we compare our algorithms with MESHJOIN using a synthetic data set with a known skewed distribution.

Bio

Presently, Muhammad Asif Naeem is a Lecturer in School of Computing and Mathematical Sciences, Auckland University of Technology, Auckland. Asif has completed his PhD degree in Computer Science from The University of Auckland, New Zealand. The title of his PhD research was “Efficient Joins to Process Stream Data”. In his PhD research he has designed a number of novel join algorithms to process various kinds of stream data efficiently. These algorithms have been published in literature. Before that he has done his Master’s degree with distinction in the area of Web Mining. 

 

Everybody is cordially welcome! 

Please, forward this invitation to interested colleagues.

Best regards,

Alexander Borusan


Prof. Jeffrey Naughton, University of Wisconsin-Madison

Prof. Jeffrey Naughton, University of Wisconsin-Madison

Title

Two-Phase Entity Resolution

Abstract

Entity resolution refers to a process that decides which pairs of records in a database refer to the same real world entities. This problem has received a great deal of attention, and a number of powerful techniques have been proposed.  In our work we consider a simple but commonly used approach: developers are presented with pairs of records, and are asked to provide rules that determine whether the records refer to the same or different entities. Our contribution is that we consider a very rigid and apparently inflexible way of defining and applying the rules: in phase one, we only consider rules that indicate when records refer to different entities; in phase two, we only consider rules that indicate when records refer to the same entity. We show that this approach has a number of advantages, specifically that the rule applications within a phase are associative and commutative, and powerful automatic techniques are available to suggest pairs of records for developers to inspect. Perhaps surprisingly, despite the inflexible process for rule definition and application, the results on some benchmark datasets are encouraging. We will conclude with a discussion of future challenges and what might need to be done to evaluate how will this approach works in practice. This is joint work with Sun Chong and AnHai Doan.

Bio

 

Jeff Naughton is a Professor of Computer Sciences at the University of Wisconsin, Madison. Professor Naughton received a B.S. degree from the University of Wisconsin-Madison in 1982 and a Ph.D. degree from Stanford University in 1987. Professor Naughton was awarded a Presidential Young Investigator award in 1991 and is a Fellow of the ACM.

Everybody is cordially welcome! 

Please, forward this invitation to interested colleagues.

Best regards,

Alexander Borusan


Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions