direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Publikationen

Materialization and Reuse Optimizations for Production Data Science Pipelines, SIGMOD 2022, to Appear
Zitatschlüssel DerakhshanMKRM
Autor Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Zoi Kaoudi, Tilmann Rabl, Volker Markl
Jahr 2022
Journal SIGMOD Conference
Notiz to appear
Zusammenfassung Abstract: Many companies and businesses train and deploy machine learning (ML) pipelines to answer prediction queries. In many applications, new training data continuously becomes available. A typical approach to ensure that ML models are up-to-date is to retrain the ML pipelines following a schedule, e.g., every day on the last seven days of data. Several use cases, such as A/B testing and ensemble learning, require many pipelines to be deployed in parallel. Existing solutions train and deploy one pipeline at a time, which generates redundant data processing since pipelines usually share similar operators. Our goal is to eliminate redundant data processing in production data science pipelines using materialization and reuse optimizations. We first categorize the generated artifacts of the pipeline operators into three groups, i.e., computed statistics, transformed data, and trained models. Then, we optimize the execution of the pipelines by materializing and reusing the generated artifacts. Our solution employs a materialization algorithm that given a storage budget, materializes the subset of the artifacts, which minimizes the run time of the subsequent executions. Furthermore, we offer a reuse algorithm that generates an optimal execution plan by combining the deployed pipelines into a directed acyclic graph (DAG) and reusing the materialized artifacts when appropriate. Our experiments show that our system can reduce the training time by up to an order of magnitude for different deployment scenarios.
Link zur Publikation Download Bibtex Eintrag

Zusatzinformationen / Extras

Direktzugang:

Schnellnavigation zur Seite über Nummerneingabe