Page Content
Talks DIMA Research Seminar
Talk/Location | Lecturer/Subject |
---|---|
15.10.2018 4 pm EN 719 | Prof. Dr. Birgit Beck "Some Philosophical Considerations Regarding “AI”" |
27.11.2018 4 pm Smart Data Forum (Salzufer 6, Eingang Otto-Dibelius-Strasse, 10587 Berlin) | Prof. Renée J. Miller, Northeastern University "Open Data Integration" |
17.12.2018 4 pm EN 719 | Dr. Alberto Lerner, eXascale Infolab at
the University of Fribourg, Switzerland "The Case for Network-Accelerated Query Processing" |
29.01.2019 2 pm EN 719 | Dr. Jan Sürmeli, TU Berlin "Decentralizing Identity" |
04.02.2019 4 pm EN 719 | Prof. Uwe Röhm, University of Sydney "Serialisable Snapshot Isolation on Multicore Servers" |
11.02.2019 2.30 pm EN 719 | Eleni Tzirita Zacharatou, École polytechnique fédérale
de Lausanne (EPFL) "Interactive and Exploratory Spatio-Temporal Data Analytics" |
26.02.2019 2.30 pm DFKI, Room Weizenbaum, Alt-Moabit 91 c, Berlin | Ankit Chaudhary "Migrating Towards Stream Processing and Micro-Services" |
Dr. Alberto Lerner, eXascale Infolab at the University of Fribourg, Switzerland
Location:
TU Berlin, EN
building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587
Berlin
Title:
The Case for
Network-Accelerated Query Processing
Abstract:
The fastest plans in MPP databases
are usually those with the least amount of data movement across nodes.
That‘s because data does not get processed while in transit. The
network switches that connect MPP nodes are hard-wired to strictly
perform packet-forwarding logic. In a recent paradigm shift, however,
network devices are becoming “programmable.” The quotes here are
cautionary. Switches had not become general purpose computers
suddenly. But now the set of tasks they can perform can be encoded in
software—and that means such switch can be instructed to manipulate
the data it is forwarding.
In this talk we explore this
programmability to accelerate OLAP queries. We found that we can
offload onto the switch some very common and expensive query patterns.
Moving data through networking equipment can hence for the first time
contribute to query execution. Our preliminary results show that we
can improve response times on even the best agreed upon plans by more
than 2x using 25 Gbps networks. We also see the promise of linear
performance improvement with faster speeds. The use of programmable
switches can open new possibilities of architecting rack- and
datacenter-sized database systems, with implications across the
stack.
Bio:
Alberto Lerner
is a Senior Researcher at the eXascale Infolab at the University of
Fribourg, Switzerland. His interests revolve around systems that
explore closely coupling of hardware and software in order to realize
untapped performance and/or functionality. Previously, he spent years
in the industry consulting for large, data-hungry verticals such as
finance and advertisement. He had also been part of the teams behind a
few different database engines: IBM‘s DB2, working on robustness
aspects of the query optimizer, Google‘s Bigtable, on elasticity
aspects, and MongoDB, on general architecture. Alberto received his
Ph.D. from ENST - Paris (now ParisTech), having done his thesis
research work at INRIA/Rocquencourt and NYU. He‘s also done
post-doctoral work at IBM Research (both at T.J. Watson and
Almaden).
Prof. Renée J. Miller, Northeastern University
Location:
Smart Data Forum (Salzufer 6, Eingang Otto-Dibelius-Strasse, 10587
Berlin)
Title:
Open Data
Integration
Abstract:
Open Data
plays a major role in open government initiatives. Governments around
the world are adopting Open Data Principles promising to make their
Open Data complete, primary, and timely. These properties make this
data tremendously valuable to data scientists. However scientists
generally do not have a priori knowledge about what data is available
(its schema or content), but will want to be able to use Open Data and
integrate it with other public or private data they are studying.
Traditionally, data integration is done using a framework called
“query discovery” where the main task is to discover a query (or
transformation script) that transforms data from one form into
another. The goal is to find the right operators to join, nest, group,
link, and twist data into a desired form. In this talk, I introduce a
new paradigm for thinking about Open Data Integration where the focus
is on “data discovery”, but highly efficient internet-scale
discovery that is heavily query-aware. As an example, a join-aware
discovery algorithm finds datasets, within a massive data lake, that
join (in a precise sense of having high containment) with a known
dataset. I describe a research agenda and recent progress in
developing scalable query-aware data discovery algorithms.
Bio:
Renée J.
Miller is a University Distinguished Professor of Computer Science at
Northeastern University. She is a Fellow of the Royal Society of
Canada, Canada’s National Academy of Science, Engineering and the
Humanities. She received the US Presidential Early Career Award for
Scientists and Engineers (PECASE), the highest honor bestowed by the
United States government on outstanding scientists and engineers
beginning their careers. She received an NSF CAREER Award, the Ontario
Premier’s Research Excellence Award, and an IBM Faculty Award. She
formerly held the Bell Canada Chair of Information Systems at the
University of Toronto and is a fellow of the ACM. Her work has focused
on the long-standing open problem of data integration and has achieved
the goal of building practical data integration systems. She and her
co-authors (Fagin, Kolaitis and Popa) received the (10 Year) ICDT
Test-of-Time Award for their influential 2003 paper establishing the
foundations of data exchange. Professor Miller has led the NSERC
Business Intelligence Strategic Network and was elected president of
the non-profit Very Large Data Base Foundation. She received her PhD
in Computer Science from the University of Wisconsin, Madison and
bachelor’s of science degrees in Mathematics and Cognitive Science
from MIT.
Prof. Dr. Birgit Beck, TU Berlin, FG Ethik und Technikphilosophie
Title:
Some Philosophical Considerations Regarding “AI”
Abstract:
In today’s society, the notion of “artificial intelligence”
is ubiquitous. Recently, there are voices from
science, politics
and economy calling for “ethical guidelines” regarding AI.
Although ethical guidelines
are certainly a good thing to have,
it appears necessary, first and foremost, to determine what exactly
the object of such guidelines would be.
The present talk
addresses this question by scrutinising the meaning of “artificial
intelligence” and
assumes on the basis of some exemplary
instances of “AI” that the notion of “artificial
intelligence”
simpliciter is a vague and, therefore,
misleading term.
Dr. Jan Sürmeli, TU Berlin "Decentralizing Identity"
Title:
Decentralizing
Identity
Abstract:
The digital
transformation promises the interaction between an increasing number
of entities such as persons, devices, vehicles and sensors. Whereas 5G
tackles the problem of connectivity, secure digital identities are the
key ingredient for secure and trustful interaction between partners,
and thus form an important cornerstone of the „Internet of Everyone
and Everything“. Current identity solutions rely on central
providers to manage and certify digital identities as trusted
intermediaries in transactions.
In this talk, we will discuss
the notion of Self-Sovereign Identity – a concept giving entities
full control and responsibility over their own digital identities,
while maintaining trust, privacy and data economy. While trusted third
parties are still required, they are decoupled from the actual
transactions between partners, thus decentralizing identity
management.
Bio:
Jan
Sürmeli is a postdoctoral researcher at Technische Universität
Berlin, and guest researcher at the FZI Forschungszentrum Informatik.
Since 2017, he works on Identity Management, Privacy-enhancing
Technologies and the application of Distributed Ledger Technologies
together with Prof. Stefan Jähnichen.
He received his doctoral
degree in Computer Science from Humboldt-Universität zu Berlin, where
his research focused on modeling and analysis of distributed systems,
business processes and event-based systems.
Prof. Uwe Röhm, University of Sydney
Title:
Serialisable
Snapshot Isolation on Multicore Servers
Abstract:
Database systems need to provide
efficient read/write access to large, shared data sets. Modern
database workloads often contain analytical queries, which makes
snapshot databases based on a multi-version storage layer an
attractive system design. It is well known that snapshot-based
algorithms scale better for read-only transactions than locking-based
systems. However, a major pitfall is that the standard snapshot
isolation (SI) algorithm allows non-serialisable executions...
This talk revisits the development of snapshot based
concurrency control algorithms and discusses an efficient approach to
provide serialisable snapshot isolation inside a database system -
with almost the same performance than standard SI. We further take a
look at scalability of SI-based database engines on multicore servers.
Our work shows that many implementations of SI do not scale well as
number of CPU cores increases, and the talk discusses approaches to
avoid this scalability bottleneck with database systems on modern
multicore servers.
Bio:
Uwe
Röhm is Associate Professor in database systems at the University of
Sydney. He is a computer science graduate from the University of
Passau, Germany, and completed his PhD at ETH Zurich in the area of
scheduling combined OLTP/OLAP workloads in a cluster of databases.
Much of his research has dealt with transaction management and
replication, especially how to ensure sufficient freshness in values
read. His work with the database research group at the University of
Sydney on snapshot databases as resulted in several awards, including
recently the ACM SIGMOD 2018 Test of Time Award for their work on
serialisable snapshot isolation and a corresponding implementation
which nowadays is integrated in the PostgreSQL database system.
His current research interests are cloud data management, database
engines on modern hardware, and in-database support of complex
data-intensive computations, eg. for Data Science. Uwe Röhm held
several visiting academic positions in the recent years at Microsoft,
KIT, and at TU Munich. He currently is guest professor at the data
management group of TU Darmstadt, Germany.
Eleni Tzirita Zacharatou, École polytechnique fédérale de Lausanne (EPFL)
Title:
Interactive and Exploratory Spatio-Temporal Data Analytics
Abstract:
The recent explosion in
the number and size of spatio-temporal data sets from various sources,
such as scientific simulations, urban environments and social sensors,
creates new opportunities for data-driven discoveries and at the same
time new challenges for analyzing these data. The complexity and cost
of evaluating queries over space and time for large volumes of data
often limit analyses to well-defined questions. To support interactive
exploratory analyses, data management solutions such as query
processing algorithms and indexing methods need to provide fast
response times.
In this talk, I will first present
an approach that evaluates on commodity hardware spatial aggregation
queries on-the-fly at interactive speeds, achieved by converting
queries into sets of drawing operations on a canvas and leveraging the
rendering pipeline of the graphics hardware (GPU). I will then
describe a compressed time series index that accelerates the discovery
of interesting events in time series data by encoding time series
values as bitmaps and applying Quadtree-based decomposition. Finally,
I will give an overview of techniques that we have developed to
summarize spatial data more accurately and to query multiple spatial
data sets efficiently.
Bio:
Eleni Tzirita Zacharatou is a last-year PhD student at the
Data-Intensive Applications and Systems Laboratory at the École
polytechnique fédérale de Lausanne (EPFL), working under the
supervision of Prof. Anastasia Ailamaki. Her research interests are
centered around the management of spatio-temporal data, with a focus
on query processing algorithms and indexing methods for exploratory
analysis tasks. In summer 2016, she was a visiting researcher at New
York University, working with Prof. Juliana Freire. She received the
Diploma M. Eng. Degree in Electrical and Computer Engineering from the
National Technical University of Athens in 2013. Eleni is the
recipient of the ACM SIGMOD 2018 best demonstration award.
Ankit Chaudhary
Location:
DFKI, Room Weizenbaum, Alt-Moabit 91 c, Berlin
Title:
Migrating Towards Stream Processing and Micro-Services.
Abstract:
Dilax
Intelcom GmbH is the market leader in providing People Counting
solutions for both Retail and Public Mobility sector. We not only
manufacture necessary sensors for automatic people counting but also
develop software to analyze the generated sensor data. In this talk, I
will describe how we revamped our monolithic application and adopted
microservice based architecture to support multitenancy and high
service availability. I will also talk about how we introduced changes
in the software to enable real-time event processing.
Bio:
I am working as a Software Engineer with Dilax Intelcom since 2016
and have a total industry experience of 9 years. With Dilax, apart
from developing interesting use-cases related to passenger counting
data, I am also involved in provisioning and maintenance of the
software. In my free time, I like to work with my friends on various
fun projects using GCP.