direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Talks DIMA Research Seminar

Talks SS13
Talk/Location
Lecturer/Subject
02.04.2013
01.00 pm
DIMA
EN719

Sebastian Breß, Otto-von-Guericke-Universität Magdeburg
13.06.2013, 2:00 p.m
DIMA
EN 719

Prof. Periklis Andritsos, University of Toronto
"Finiding and extracting structure in large datasets"
13.06.2013, 4:00 p.m
DIMA
EN 719

Frank McSherry, Microsoft
"Naiad: a system for iterative, incremental, and interactive distributed dataflow"
14.06.2013, 10:00 p.m
DIMA
EN 719

Frank McSherry, Microsoft

"Differential Dataflow"

14.06.2013,
12.00
DIMA
EN 719


 Asterios Katsifodimos, INRIA
"Scalable View-based Techniques for Web Data: Algorithms and Systems"

25.07.2013,
10.15 AM
DIMA
EN 719
Jimmy Lin, Twitter and the University of Maryland
"Real-Time Search at Twitter"

Sebastian Breß, Otto-von-Guericke-Universität Magdeburg

TITLE:

Automatic Selection of Processing Units for Coprocessing in Databases

ABSTRACT: 

Specialized processing units such as GPUs or FPGAs provide great opportunities to speed up database operations by exploiting parallelism and relieving the CPU. But utilizing coprocessors efficiently poses major challenges to developers. Besides finding fine-granular data parallel algorithms and tuning them for the available hardware, it has to be decided at runtime which (co)processor should be chosen to execute a specific task. Depending on input parameters, wrong decisions can lead to severe performance degradations since involving coprocessors introduces a significant overhead, e.g., for data transfers. We present a framework that automatically learns and adapts execution models for arbitrary algorithms on any (co)processor to find break-even points and support scheduling decisions. We demonstrate its applicability for three common use cases in modern database systems and show how their performance can be improved with wise scheduling decisions. Furthermore, we discuss prelimenary results in our research.

Speaker Biography: 

Sebastian Breß studierte Informatik an der Otto-von-Guericke-Universität Magdeburg und schloss 2010 sein Bachelor- und 2012 sein Masterstudium ab. Seit April 2012 promoviert er in Magdeburg am Lehrstuhl für Datenbanken und Informationssysteme zum Thema „Heterogeneous Scheduling of Database Queries for hybrid CPU/GPU Platforms“. Dabei geht es insbesondere um die effektive Nutzung von verfügbaren Rechenressourcen (CPUs oder GPUs) während der Anfrageverarbeitung

 

Everybody is cordially welcome!

Prof. Periklis Andritsos, University of Toronto

Title:

Finiding and extracting structure in large datasets

Abstract:

Data design has been characterized as a process of arriving at a design that maximizes the

information content of each piece of data (or equivalently, one that minimizes redundancy).

Information content (or redundancy) is measured with respect to a prescribed model for the

data, a model that is often expressed as a set of constraints. In this talk, I consider

the problem of doing data redesign in an environment where the prescribed model is unknown

or incomplete or is the result of integrated information. Specifically, I consider the problem

of finding structural clues in a relational instance of data, missing values, and duplicate records.

We propose a set of clustering-based information-theoretic tools for finding structural summaries

that are useful in characterizing the information content of the data, and ultimately useful

in the design of new relational storage spaces. We study the use of summaries in one specific

physical design task. I also show how these information-theoretic tools can assist in information

extraction tasks and the building of attribute dictionaries in unstructured repositories of

product data.

Speaker Biography: 

Periklis Andritsos is an Assistant Professor at the University of Toronto, Faculty of 

Information (iSchool). He received his B.Sc. degree in Electrical and Computer Engineering from the                

National Technical University of Athens, Greece. He then moved to Toronto for his graduate

studies and holds an M.Sc. and Ph.D. degree in Computer Science from the University of Toronto.

He has also been an Assistant Professor at the University of Trento and the Free University

of Bozen-Bolzano, both in Italy.

His research focuses on the analysis of large repositories and, more specifically, the structure

discovery in order to facilitate design and speed up querying. He has developed a clustering

algorithm for categorical data, which has also formed the basis of his novel work on discovering

alternative schemas in databases with inconsistencies and errors. His techniques have also been

used and patented in the industry. He is a senior member of the IEEE Computer Society and the

Association for Computing Machinery. 

He is currently visiting the Database Systems and Information Management Group at the

Technical University of Berlin.

 

Everybody is cordially welcome! 

 

Please, forward this invitation to interested colleagues.

Frank McSherry, Microsoft

Title:

Differential Dataflow

Abstract:

This talk will cover a new computational frameworks supported by Naiad, differential dataflow, that generalizes standard incremental dataflow for far greater re-use of previous results when collections change. Informally, differential dataflow distinguishes between the multiple reasons a collection might change, including both loop feedback and new input data, allowing a system to re-use the most appropriate results from previously performed work when an incremental update arrives. Our implementation of differential dataflow efficiently executes queries with multiple (possibly nested) loops, while simultaneously responding with low latency to incremental changes to the inputs. We show how differential dataflow enables orders of magnitude speedups for a variety of workloads on real data, and enables new analyses previously not possible in an interactive setting.

This is joint work with Derek G. Murray, Rebecca Isaacs, and Michael Isard.
Speaker Biography:

http://research.microsoft.com/en-us/people/mcsherry/


Everybody is cordially welcome!

Please, forward this invitation to interested colleagues.

 

Title:

Naiad: a system for iterative, incremental, and interactive distributed dataflow

Abstract:

In this talk I’ll describe the Naiad system, based on a new model for low-latency incremental and iterative dataflow. Naiad is designed to provide three properties we do not think yet exist in a single system: the expressive power of loops, concurrent vertex execution, and fine-grained edge completion. Removing any one of these requirements yields an existing class of solutions (respectively: streaming systems like StreamInsight, iterative incremental systems like Nephele, and callback systems like Percolator), but all three together appear to require a new system design. We will describe Naiad’s structured cyclic dataflow model and protocol for tracking and coordinating outstanding work, more closely resembling memory fences than traditional distributed systems barriers. We give several  examples of how Naiad can be used to efficiently implement many of the currently popular “big data” programming patterns, as well as several new ones, and experimental results indicating that Naiad’s relative performance ranges from “as good as” to “much better than” existing systems.

 

This is joint work with Derek G. Murray, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. 

Speaker Biography: 

http://research.microsoft.com/en-us/people/mcsherry/

 

Everybody is cordially welcome! 

 

Jimmy Lin, Twitter and the University of Maryland

Title

Real-Time Search at Twitter

Abstract

Twitter aims to be an information platform that connects users to what they care about, 140 characters at a time. Whether it's breaking new events around the world, the latest celebrity gossip, or the recent adventures of your closest friends, the search and discovery services aim to surface relevant and personalized content in real-time.

Focusing in particular on architectures for search, in this talk I'll present Earlybird, the core retrieval engine that powers Twitter's real-time search service. Although Earlybird builds and maintains inverted indexes like nearly all modern retrieval engines, its index structures differ from those built to support traditional web search. We describe these differences and present the rationale behind our design. A key requirement of real-time search is the ability to ingest content rapidly and make it searchable immediately, while concurrently supporting low-latency, high-throughput query evaluation. We believe that our solution represents an interesting point in the design, and is well-suited to Twitter's needs.

I'll conclude with discussion of some future challenges that span natural language processing, information retrieval, text mining, and data management.

Bio

immy Lin is an associate professor in the iSchool at the University of Maryland, affiliated with the Department of Computer Science and the Institute for Advanced Computer Studies. He graduated with a Ph.D. in computer science from MIT in 2004. Lin's research lies at the intersection of information retrieval and natural language processing, and he has done work in a variety of areas, including question answering, medical informatics, and bioinformatics. Lin's current research focuses on massively-distributed data analytics in cluster-based environments.

Recently, Lin just completed an extended sabbatical at Twitter, where from 2010-2012 he worked on services designed to surface relevant content for users and the distributed infrastructure that supports mining relevance signals from massive amounts of data

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions