direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Abstract (Kopie 5)

Nowadays, extensive work has been done in the area of large-scale data processing. MapReduce as a well known framework has been widely applied in scientific and business areas. However, the limitations of MapReduce are already a general consensus: restrictive parallel operations and ineffectiveness in implementing complex algorithms. Flink and Spark, are two dataflow-based systems that extend MapReduce with richer parallel operations and optimized execution engine. The goal of this thesis is to evaluate these two systems with a comprehensive benchmark suite from various data analytic domains. Our benchmark suite consists of five workloads: Word Count, TPC-H Query 3, K-means, Page Rank, Connected Components. For each workload, we test both systems with variable datasets, memory sizes, and cluster sizes. The Result shows that Flink excels Spark in relational query and iterative algorithms. Extensions of Spark like MLlib and GraphX could give comparable performance with Flink but is more restrictive.

Abstract (Kopie 3)

Nowadays, extensive work has been done in the area of large-scale data processing. MapReduce as a well known framework has been widely applied in scientific and business areas. However, the limitations of MapReduce are already a general consensus: restrictive parallel operations and ineffectiveness in implementing complex algorithms. Flink and Spark, are two dataflow-based systems that extend MapReduce with richer parallel operations and optimized execution engine. The goal of this thesis is to evaluate these two systems with a comprehensive benchmark suite from various data analytic domains. Our benchmark suite consists of five workloads: Word Count, TPC-H Query 3, K-means, Page Rank, Connected Components. For each workload, we test both systems with variable datasets, memory sizes, and cluster sizes. The Result shows that Flink excels Spark in relational query and iterative algorithms. Extensions of Spark like MLlib and GraphX could give comparable performance with Flink but is more restrictive.

Abstract (Kopie 1)

Nowadays, extensive work has been done in the area of large-scale data processing. MapReduce as a well known framework has been widely applied in scientific and business areas. However, the limitations of MapReduce are already a general consensus: restrictive parallel operations and ineffectiveness in implementing complex algorithms. Flink and Spark, are two dataflow-based systems that extend MapReduce with richer parallel operations and optimized execution engine. The goal of this thesis is to evaluate these two systems with a comprehensive benchmark suite from various data analytic domains. Our benchmark suite consists of five workloads: Word Count, TPC-H Query 3, K-means, Page Rank, Connected Components. For each workload, we test both systems with variable datasets, memory sizes, and cluster sizes. The Result shows that Flink excels Spark in relational query and iterative algorithms. Extensions of Spark like MLlib and GraphX could give comparable performance with Flink but is more restrictive.

To top

Nur dieser text

Information

Bearbeiter: Mingliang Qi

Betreuer: Asterios Katsifodimos, Alexander Alexandrov

Abstract

Nowadays, extensive work has been done in the area of large-scale data processing. MapReduce as a well known framework has been widely applied in scientific and business areas. However, the limitations of MapReduce are already a general consensus: restrictive parallel operations and ineffectiveness in implementing complex algorithms. Flink and Spark, are two dataflow-based systems that extend MapReduce with richer parallel operations and optimized execution engine. The goal of this thesis is to evaluate these two systems with a comprehensive benchmark suite from various data analytic domains. Our benchmark suite consists of five workloads: Word Count, TPC-H Query 3, K-means, Page Rank, Connected Components. For each workload, we test both systems with variable datasets, memory sizes, and cluster sizes. The Result shows that Flink excels Spark in relational query and iterative algorithms. Extensions of Spark like MLlib and GraphX could give comparable performance with Flink but is more restrictive.

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions

Copyright TU Berlin 2008