TU Berlin

Fachgebiet Datenbanksysteme und InformationsmanagementPublikationen

Logo FG DIMA-new  65px


zur Navigation


Benchmarking Distributed Data Processing Systems for Machine Learning Workloads
Zitatschlüssel DBLP:conf/tpctc/BodenRSM18
Autor Christoph Boden, Tilmann Rabl, Sebastian Schelter, Volker Markl
Jahr 2018
Journal Performance Evaluation and Benchmarking for the Era of Artificial Intelligence - 10th TPC Technology Conference, TPCTC 2018
Zusammenfassung Distributed data processing systems have been widely adopted to robustly scale out computations on massive data sets to many compute nodes in recent years. These systems are also popular choices to scale out the training of machine learning models. However, there is a lack of benchmarks to assess how efficiently data processing systems actually perform at executing machine learning algorithms at scale. For example, the learning algorithms chosen in the corresponding systems papers tend to be those that fit well onto the system’s paradigm rather than state of the art methods. Furthermore, experiments in those papers often neglect important aspects such as addressing all aspects of scalability. In this paper, we share our experience in evaluating novel data processing systems and present a core set of experiments of a benchmark for distributed data processing systems for machine learning workloads, a rationale for their necessity as well as an experimental evaluation.
Link zur Publikation Link zur Originalpublikation Download Bibtex Eintrag



Schnellnavigation zur Seite über Nummerneingabe