Talks DIMA Research Seminar
"Scalability! But at what COST?"
Scalability! But at what COST?
Abstract: Many distributed graph processing systems are built with
scalability in mind. The more machines you add, the faster they can
go. But how fast do they actually go? We used measurements from
a recent evaluation of several popular graph processing frameworks
(Gonzalez et al, OSDI 2014) and found that the reported running times
for all systems, for all datasets, for all problems, were slower than
a single thread running on the speaker’s laptop. We claim that
performance evaluation in the current crop of scalable systems is
deeply lacking. Rather than evaluate scalability, we challenge systems
builders to report the COST, or Configuration that Outperforms a
Single Thread. This metric indicates the cross-over point at which a
scalable system’s existence is first justified. Our experience
indicates that the COST of many systems for most problems is
surprisingly high, and in some cases unbounded.
This work is joint with Michael Isard and Derek Murray
Frank McSherry is an independent researcher interested in
data-parallel computation. Before its dissolution, Frank was a senior
research at Microsoft Research SVC, where he lead the Naiad dataflow
project and co-invented differential privacy. He is currently deeply
enamored of building performant scalable systems in Rust.