Page Content
Basic Information
Student: YangChao Liu
Advisors: Sebastian Schelter, Prof. Dr. Volker Markl
Degree: Master
Abstract
Centrality measures are used to assess the
structural properties of network entities.
For example, it can be
used to rank nodes for a recommender systems. The recent
advance
in measuring centralities for very large networks has set off a host
of new
computation methods that exploit the sparse nature of real
world networks. Fur-
thermore, an increasing interest in network
analysis resulted in a plethora of graph
centered parallel
programming models which consequently demands a choice between
different computing platforms for computation intensive problems.
Among the computing platforms available, the open source distributed
parallel frameworks Apache Flink (a general purpose distributed data
processing platform) and Apache Giraph (a specially designed platform
for graph processing) are considerably noteworthy for their wide
support of graph algorithms.
This thesis aims to elucidate a systematic comparison of both
platforms by lever-
aging previous work on measuring scalable
centralities in which the conventional
closeness and betweenness
measures are redesigned (as Effective Closeness and LineRank
algorithms respectively) to reinforce embarrassingly huge, parallel
frameworks. More specifically we address the issue of programmability
with a comprehensive study on the modeling of complex algorithms in
these platforms, and the performance of these systems with diverse
experiments. The scalable version of these centrality measures are
implemented in Apache Flink as well as in Apache Giraph and run on
real graph datasets with millions of edges to evaluate their relative
effectiveness. We benchmark the results and present the novel findings
from them. It can be observed that the vertex centric model becomes
restrictive for modeling complex computations as in the LineRank
algorithm whereas Flink is useful and appropriate in such cases by
offering a declarative style programming interface. On the other hand,
Apache Giraph performs better relative to Flink in most cases and
Apache Flink performs equivalently better on large sparse graphs, as
the numbers of computations are reduced by the early detection of
converged nodes.