Talks DIMA Research Seminar
|Prof. Dr. Birgit Beck|
"Some Philosophical Considerations Regarding “AI”"
Smart Data Forum
(Salzufer 6, Eingang
|Prof. Renée J. Miller, Northeastern University|
"Open Data Integration"
|Dr. Alberto Lerner, eXascale Infolab at
the University of Fribourg, Switzerland|
"The Case for Network-Accelerated Query Processing"
|Dr. Jan Sürmeli, TU Berlin|
|Prof. Uwe Röhm, University of Sydney|
"Serialisable Snapshot Isolation on Multicore Servers"
|Eleni Tzirita Zacharatou, École polytechnique fédérale
de Lausanne (EPFL)|
"Interactive and Exploratory Spatio-Temporal Data Analytics"
DFKI, Room Weizenbaum, Alt-Moabit 91 c, Berlin
"Migrating Towards Stream Processing and Micro-Services"
Dr. Alberto Lerner, eXascale Infolab at the University of Fribourg, Switzerland
TU Berlin, EN building, seminar room EN 719 (7th floor), Einsteinufer 17, 10587 Berlin
The Case for Network-Accelerated Query Processing
The fastest plans in MPP databases are usually those with the least amount of data movement across nodes. That‘s because data does not get processed while in transit. The network switches that connect MPP nodes are hard-wired to strictly perform packet-forwarding logic. In a recent paradigm shift, however, network devices are becoming “programmable.” The quotes here are cautionary. Switches had not become general purpose computers suddenly. But now the set of tasks they can perform can be encoded in software—and that means such switch can be instructed to manipulate the data it is forwarding.
In this talk we explore this programmability to accelerate OLAP queries. We found that we can offload onto the switch some very common and expensive query patterns. Moving data through networking equipment can hence for the first time contribute to query execution. Our preliminary results show that we can improve response times on even the best agreed upon plans by more than 2x using 25 Gbps networks. We also see the promise of linear performance improvement with faster speeds. The use of programmable switches can open new possibilities of architecting rack- and datacenter-sized database systems, with implications across the stack.
Alberto Lerner is a Senior Researcher at the eXascale Infolab at the University of Fribourg, Switzerland. His interests revolve around systems that explore closely coupling of hardware and software in order to realize untapped performance and/or functionality. Previously, he spent years in the industry consulting for large, data-hungry verticals such as finance and advertisement. He had also been part of the teams behind a few different database engines: IBM‘s DB2, working on robustness aspects of the query optimizer, Google‘s Bigtable, on elasticity aspects, and MongoDB, on general architecture. Alberto received his Ph.D. from ENST - Paris (now ParisTech), having done his thesis research work at INRIA/Rocquencourt and NYU. He‘s also done post-doctoral work at IBM Research (both at T.J. Watson and Almaden).
Prof. Renée J. Miller, Northeastern University
Smart Data Forum (Salzufer 6, Eingang Otto-Dibelius-Strasse, 10587 Berlin)
Open Data Integration
Open Data plays a major role in open government initiatives. Governments around the world are adopting Open Data Principles promising to make their Open Data complete, primary, and timely. These properties make this data tremendously valuable to data scientists. However scientists generally do not have a priori knowledge about what data is available (its schema or content), but will want to be able to use Open Data and integrate it with other public or private data they are studying. Traditionally, data integration is done using a framework called “query discovery” where the main task is to discover a query (or transformation script) that transforms data from one form into another. The goal is to find the right operators to join, nest, group, link, and twist data into a desired form. In this talk, I introduce a new paradigm for thinking about Open Data Integration where the focus is on “data discovery”, but highly efficient internet-scale discovery that is heavily query-aware. As an example, a join-aware discovery algorithm finds datasets, within a massive data lake, that join (in a precise sense of having high containment) with a known dataset. I describe a research agenda and recent progress in developing scalable query-aware data discovery algorithms.
Renée J. Miller is a University Distinguished Professor of Computer Science at Northeastern University. She is a Fellow of the Royal Society of Canada, Canada’s National Academy of Science, Engineering and the Humanities. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and is a fellow of the ACM. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her co-authors (Fagin, Kolaitis and Popa) received the (10 Year) ICDT Test-of-Time Award for their influential 2003 paper establishing the foundations of data exchange. Professor Miller has led the NSERC Business Intelligence Strategic Network and was elected president of the non-profit Very Large Data Base Foundation. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s of science degrees in Mathematics and Cognitive Science from MIT.
Prof. Dr. Birgit Beck, TU Berlin, FG Ethik und Technikphilosophie
Some Philosophical Considerations Regarding “AI”
In today’s society, the notion of “artificial intelligence”
is ubiquitous. Recently, there are voices from
science, politics and economy calling for “ethical guidelines” regarding AI. Although ethical guidelines
are certainly a good thing to have, it appears necessary, first and foremost, to determine what exactly
the object of such guidelines would be.
The present talk addresses this question by scrutinising the meaning of “artificial intelligence” and
assumes on the basis of some exemplary instances of “AI” that the notion of “artificial intelligence”
simpliciter is a vague and, therefore, misleading term.
Dr. Jan Sürmeli, TU Berlin "Decentralizing Identity"
The digital transformation promises the interaction between an increasing number of entities such as persons, devices, vehicles and sensors. Whereas 5G tackles the problem of connectivity, secure digital identities are the key ingredient for secure and trustful interaction between partners, and thus form an important cornerstone of the „Internet of Everyone and Everything“. Current identity solutions rely on central providers to manage and certify digital identities as trusted intermediaries in transactions.
In this talk, we will discuss the notion of Self-Sovereign Identity – a concept giving entities full control and responsibility over their own digital identities, while maintaining trust, privacy and data economy. While trusted third parties are still required, they are decoupled from the actual transactions between partners, thus decentralizing identity management.
Jan Sürmeli is a postdoctoral researcher at Technische Universität Berlin, and guest researcher at the FZI Forschungszentrum Informatik. Since 2017, he works on Identity Management, Privacy-enhancing Technologies and the application of Distributed Ledger Technologies together with Prof. Stefan Jähnichen.
He received his doctoral degree in Computer Science from Humboldt-Universität zu Berlin, where his research focused on modeling and analysis of distributed systems, business processes and event-based systems.
Prof. Uwe Röhm, University of Sydney
Serialisable Snapshot Isolation on Multicore Servers
Database systems need to provide efficient read/write access to large, shared data sets. Modern database workloads often contain analytical queries, which makes snapshot databases based on a multi-version storage layer an attractive system design. It is well known that snapshot-based algorithms scale better for read-only transactions than locking-based systems. However, a major pitfall is that the standard snapshot isolation (SI) algorithm allows non-serialisable executions...
This talk revisits the development of snapshot based concurrency control algorithms and discusses an efficient approach to provide serialisable snapshot isolation inside a database system - with almost the same performance than standard SI. We further take a look at scalability of SI-based database engines on multicore servers. Our work shows that many implementations of SI do not scale well as number of CPU cores increases, and the talk discusses approaches to avoid this scalability bottleneck with database systems on modern multicore servers.
Uwe Röhm is Associate Professor in database systems at the University of Sydney. He is a computer science graduate from the University of Passau, Germany, and completed his PhD at ETH Zurich in the area of scheduling combined OLTP/OLAP workloads in a cluster of databases. Much of his research has dealt with transaction management and replication, especially how to ensure sufficient freshness in values read. His work with the database research group at the University of Sydney on snapshot databases as resulted in several awards, including recently the ACM SIGMOD 2018 Test of Time Award for their work on serialisable snapshot isolation and a corresponding implementation which nowadays is integrated in the PostgreSQL database system. His current research interests are cloud data management, database engines on modern hardware, and in-database support of complex data-intensive computations, eg. for Data Science. Uwe Röhm held several visiting academic positions in the recent years at Microsoft, KIT, and at TU Munich. He currently is guest professor at the data management group of TU Darmstadt, Germany.
Eleni Tzirita Zacharatou, École polytechnique fédérale de Lausanne (EPFL)
Interactive and Exploratory Spatio-Temporal Data Analytics
The recent explosion in the number and size of spatio-temporal data sets from various sources, such as scientific simulations, urban environments and social sensors, creates new opportunities for data-driven discoveries and at the same time new challenges for analyzing these data. The complexity and cost of evaluating queries over space and time for large volumes of data often limit analyses to well-defined questions. To support interactive exploratory analyses, data management solutions such as query processing algorithms and indexing methods need to provide fast response times.
In this talk, I will first present an approach that evaluates on commodity hardware spatial aggregation queries on-the-fly at interactive speeds, achieved by converting queries into sets of drawing operations on a canvas and leveraging the rendering pipeline of the graphics hardware (GPU). I will then describe a compressed time series index that accelerates the discovery of interesting events in time series data by encoding time series values as bitmaps and applying Quadtree-based decomposition. Finally, I will give an overview of techniques that we have developed to summarize spatial data more accurately and to query multiple spatial data sets efficiently.
Eleni Tzirita Zacharatou is a last-year PhD student at the Data-Intensive Applications and Systems Laboratory at the École polytechnique fédérale de Lausanne (EPFL), working under the supervision of Prof. Anastasia Ailamaki. Her research interests are centered around the management of spatio-temporal data, with a focus on query processing algorithms and indexing methods for exploratory analysis tasks. In summer 2016, she was a visiting researcher at New York University, working with Prof. Juliana Freire. She received the Diploma M. Eng. Degree in Electrical and Computer Engineering from the National Technical University of Athens in 2013. Eleni is the recipient of the ACM SIGMOD 2018 best demonstration award.
DFKI, Room Weizenbaum, Alt-Moabit 91 c, Berlin
Migrating Towards Stream Processing and Micro-Services.
Dilax Intelcom GmbH is the market leader in providing People Counting solutions for both Retail and Public Mobility sector. We not only manufacture necessary sensors for automatic people counting but also develop software to analyze the generated sensor data. In this talk, I will describe how we revamped our monolithic application and adopted microservice based architecture to support multitenancy and high service availability. I will also talk about how we introduced changes in the software to enable real-time event processing.
I am working as a Software Engineer with Dilax Intelcom since 2016
and have a total industry experience of 9 years. With Dilax, apart
from developing interesting use-cases related to passenger counting
data, I am also involved in provisioning and maintenance of the
software. In my free time, I like to work with my friends on various
fun projects using GCP.