A System Architecture for Communication-Efficient Distributed Machine Learning
- © DIMA/Renz-Wieland
The Software Campus  backed LAPSE aims to develop a system architecture that mitigates communication costs for distributed machine learning.
Problem. Training machine learning (ML) models on a cluster, in contrast to a single machine, increases both compute power and available memory. However, as a trade-off, it requires communication among cluster nodes, in order to synchronize model parameters. For some ML models, synchronization can dominate the training process and thereby negate the benefits of employing a cluster.
Solution. To reduce communication, researchers have developed algorithms that exploit locality, whereby, workers solely update a subset of the model parameters, at a given time. Typically, workers update different subsets over the course of training. Locality-exploiting algorithms (LEA) exist for multiple types of ML models and locality can stem from the training algorithm, ML model, or training data.
Lingering Hurdle. Typically, ML developers implement LEA from scratch, which requires them to possess knowledge(e.g., low-level details) about distributed computing systems. In the LAPSE project, we aim to develop a system that enables both researchers and practitioners to implement LEA and forego the need for detailed distributed computing knowledge.
Contribution. A novel state-of-the-art architecture for distributed ML that meets the needs of parameter servers and is usable and efficient for LEA. Our intention is to yield a solution that is applicable to a wide-range of ML applications and aids in the development of advanced ML-based solutions for today‘s societal challenges.
- © Software Campus
LAPSE is funded by the Federal Ministry of Education and Research  (BMBF) via Software Campus , and is supported by TRUMPF , an industry partner.
 Software Campus
Sponsored by the German Federal Ministry of Education and Research (BMBF), Software Campus (SC) is a development program that aims to prepare tomorrow’s Senior IT Executives. The SC program combines scientific leading-edge research with hands-on management practice in an entirely new and innovative concept. It is directed at outstanding computer science doctoral students who are interested in taking over executive management functions in industry. Awardees lead their own research projects in cooperation with industry partners over a one to two year period.
Kickoff des Jahrgangs 2017