TU Berlin

Database Systems and Information Management GroupSS19

Logo FG DIMA-new  65px

Page Content

to Navigation

Talks DIMA Research Seminar

Talks  SS18
19.07.2019 10.30 Uhr
DFKI Projektbüro Berlin, Room: Weizenbaum, Alt Moabit 91c, Berlin
Zoi Kaoudi
"Scalable Machine Learning    for Everyone"

19.07.2019 11.15 Uhr
DFKI Projektbüro Berlin, Room: Weizenbaum, Alt Moabit 91c, Berlin
Jorge Arnulfo Quiané-Ruiz
"In Quest of "Democratizing Big Data Processing"

Zoi Kaoudi, Qatar Computing Research Institute (QCRI)


"Scalable Machine Learning for Everyone"


As machine learning (ML) permeates into diverse application domains, there is an urgent need to ease the use of ML algorithms and methods, for example, by providing a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. In this talk we will present ML4all, an ML system that offers a cost-based GD optimizer to select the best GD plan for a given ML task. We will introduce a set of abstract operators for expressing GD algorithms that ML4all uses in order to facilitate the optimization. A big challenge we had to tackle when building ML4all is estimating the number of iterations an algorithm requires to converge. This is necessary for computing the cost of each algorithm. We will describe a novel speculative approach to achieve this. We will finally show through experimental evaluation that ML4all not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.

Short bio:
Zoi Kaoudi is a Scientist in the Qatar Computing Research Institute (QCRI). Her research interests lie in the intersection of machine learning systems, data management and knowledge graphs. She has also worked on cross-platform data processing. Previously she was a scientific associate in IMIS-Athena Research Center and a postdoctoral researcher in Inria. She received her PhD in Computer Science from the National and Kapodistrian University of Athens in 2011. Recently she has been the proceedings chair of EDBT2019, has co-chaired the TKDE poster track co-located with ICDE2018, and co-organized the MLDAS2019 held in Qatar. She has co-authored articles in both database and Semantic Web communities and served as a member of a Program Committee for several international database conferences.



Jorge Arnulfo Quiané-Ruiz, Qatar Computing Research Institute (QCRI)

Title: In Quest of Democratizing Big Data Processing


The big data ecosystem is quite diverse -- one big data platform is unlikely to cater to every need. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different big data platforms in order to be able to perform queries from data science and business intelligence applications.

In this talk, we will introduce cross-platform data processing, a new paradigm for data processing that aims at effectively and efficiently processing queries by combining multiple big data platforms. We will present our recent work to support cross-platform data processing. We have built  Rheem, the first cross-platform data processing system that decouples applications from the underlying platforms. Rheem splits a query into subqueries and assigns each of them to a specific big data platform to minimize the overall query cost (\eg~runtime or monetary cost). In particular, we will discuss different common cases where a query goes beyond the limits of a single big data platform and the benefits of using Rheem.

Besides our current efforts, the task of democratizing data processing for supporting data science effectively is far from being a reality. There exists a number of open problems that must be solved before we can start talking about a democratization of data science. We will conclude this talk with a discussion on these open problems as well as with a call for arms for our database community.

Short Biography:

Jorge Arnulfo Quiané-Ruiz is a Senior Scientist at the Qatar Computing Research Institute (QCRI). His research interests are around building scalable and efficient systems to facilitate the data processing task to novel applications. His recent work, Rheem, focused on providing cross-platform data processing: using multiple data processing platforms to perform application tasks. Before moving to QCRI, he was a postdoctal researcher at Saarland University and a research engineer at INRIA. He obtained his PhD in computer science at INRIA (from the University of Nantes). His work on inequality joins was selected as one of the best papers in VLDB’15 and published in a special issue of the VLDB Journal. He received an Excellent Presentation Award at VLDB 2014 for his presentation in scalable data profiling.




Quick Access

Schnellnavigation zur Seite über Nummerneingabe