Student: YangChao Liu
Advisors: Sebastian Schelter, Prof. Dr. Volker Markl
Centrality measures are used to assess the
structural properties of network entities.
For example, it can be used to rank nodes for a recommender systems. The recent
advance in measuring centralities for very large networks has set off a host of new
computation methods that exploit the sparse nature of real world networks. Fur-
thermore, an increasing interest in network analysis resulted in a plethora of graph
centered parallel programming models which consequently demands a choice between different computing platforms for computation intensive problems. Among the computing platforms available, the open source distributed parallel frameworks Apache Flink (a general purpose distributed data processing platform) and Apache Giraph (a specially designed platform for graph processing) are considerably noteworthy for their wide support of graph algorithms.
This thesis aims to elucidate a systematic comparison of both platforms by lever-
aging previous work on measuring scalable centralities in which the conventional
closeness and betweenness measures are redesigned (as Effective Closeness and LineRank algorithms respectively) to reinforce embarrassingly huge, parallel frameworks. More specifically we address the issue of programmability with a comprehensive study on the modeling of complex algorithms in these platforms, and the performance of these systems with diverse experiments. The scalable version of these centrality measures are implemented in Apache Flink as well as in Apache Giraph and run on real graph datasets with millions of edges to evaluate their relative effectiveness. We benchmark the results and present the novel findings from them. It can be observed that the vertex centric model becomes restrictive for modeling complex computations as in the LineRank algorithm whereas Flink is useful and appropriate in such cases by offering a declarative style programming interface. On the other hand, Apache Giraph performs better relative to Flink in most cases and Apache Flink performs equivalently better on large sparse graphs, as the numbers of computations are reduced by the early detection of converged nodes.