Inhalt des Dokuments
Termine DIMA Kolloquium
Termin/Ort | Dozent/Thema |
---|---|
19.07.2019 10.30
Uhr DFKI Projektbüro Berlin, Room: Weizenbaum, Alt Moabit 91c, Berlin | Zoi Kaoudi
"Scalable Machine Learning for Everyone" |
19.07.2019 11.15 Uhr DFKI Projektbüro Berlin, Room: Weizenbaum, Alt Moabit 91c, Berlin | Jorge Arnulfo
Quiané-Ruiz "In Quest of "Democratizing Big Data Processing" |
Zoi Kaoudi, Qatar Computing Research Institute (QCRI)
Title:
"Scalable Machine Learning for Everyone"
Abstract:
As machine learning (ML) permeates into diverse application
domains, there is an urgent need to ease the use of ML algorithms and
methods, for example, by providing a declarative framework for ML.
Ideally, a user will specify an ML task in a high-level and
easy-to-use language and the framework will invoke the appropriate
algorithms and system configurations to execute it. An important
observation towards designing such a framework is that many ML tasks
can be expressed as mathematical optimization problems, which take a
specific form. Furthermore, these optimization problems can be
efficiently solved using variations of the gradient descent (GD)
algorithm. Thus, to decouple a user specification of an ML task from
its execution, a key component is a GD optimizer. In this talk we will
present ML4all, an ML system that offers a cost-based GD optimizer to
select the best GD plan for a given ML task. We will introduce a set
of abstract operators for expressing GD algorithms that ML4all uses in
order to facilitate the optimization. A big challenge we had to tackle
when building ML4all is estimating the number of iterations an
algorithm requires to converge. This is necessary for computing the
cost of each algorithm. We will describe a novel speculative approach
to achieve this. We will finally show through experimental evaluation
that ML4all not only chooses the best GD plan but also allows for
optimizations that achieve orders of magnitude performance speed-up.
Short bio:
Zoi Kaoudi is a Scientist in the Qatar
Computing Research Institute (QCRI). Her research interests lie in the
intersection of machine learning systems, data management and
knowledge graphs. She has also worked on cross-platform data
processing. Previously she was a scientific associate in IMIS-Athena
Research Center and a postdoctoral researcher in Inria. She received
her PhD in Computer Science from the National and Kapodistrian
University of Athens in 2011. Recently she has been the proceedings
chair of EDBT2019, has co-chaired the TKDE poster track co-located
with ICDE2018, and co-organized the MLDAS2019 held in Qatar. She has
co-authored articles in both database and Semantic Web communities and
served as a member of a Program Committee for several international
database conferences.
Jorge Arnulfo Quiané-Ruiz, Qatar Computing Research Institute (QCRI)
Title: In Quest of Democratizing Big Data Processing
Abstract:
The big data ecosystem is quite diverse -- one big data platform is
unlikely to cater to every need. As a result, organizations typically
perform tedious and costly tasks to juggle their code and data across
different big data platforms in order to be able to perform queries
from data science and business intelligence applications.
In this talk, we will introduce cross-platform data processing, a new
paradigm for data processing that aims at effectively and efficiently
processing queries by combining multiple big data platforms. We will
present our recent work to support cross-platform data processing. We
have built Rheem, the first cross-platform data processing
system that decouples applications from the underlying platforms.
Rheem splits a query into subqueries and assigns each of them to a
specific big data platform to minimize the overall query cost
(eg~runtime or monetary cost). In particular, we will discuss
different common cases where a query goes beyond the limits of a
single big data platform and the benefits of using Rheem.
Besides our current efforts, the task of democratizing data processing
for supporting data science effectively is far from being a reality.
There exists a number of open problems that must be solved before we
can start talking about a democratization of data science. We will
conclude this talk with a discussion on these open problems as well as
with a call for arms for our database community.
Short Biography:
Jorge Arnulfo Quiané-Ruiz is a Senior Scientist at the Qatar Computing Research Institute (QCRI). His research interests are around building scalable and efficient systems to facilitate the data processing task to novel applications. His recent work, Rheem, focused on providing cross-platform data processing: using multiple data processing platforms to perform application tasks. Before moving to QCRI, he was a postdoctal researcher at Saarland University and a research engineer at INRIA. He obtained his PhD in computer science at INRIA (from the University of Nantes). His work on inequality joins was selected as one of the best papers in VLDB’15 and published in a special issue of the VLDB Journal. He received an Excellent Presentation Award at VLDB 2014 for his presentation in scalable data profiling.