چكيده به لاتين
Abstract:
Because of the existence of massive number of irregular edges, distributed processing of large-scale graphs has two main problems. Due to the poor locality of computations in graph algorithms, the first problem is the low parallelism degree. The second one is high communications between machines in the partitioned graph. These problems not only increase the network traffic of distributed graph processing systems, but also reduce the utilization of computational resources.
To overcome the first problem, a new computational model for distributed processing of graphs is proposed. The main goal of this model is to maximize the utilization of local processing and information resources. The key idea of this model is that each machine continues its computation independently as much as possible. When it is inevitable, it exchanges information with other machines. To this end, the proposed system introduces a three-layered model to benefit advantages of both shared-memory and distributed-memory architectures, simultaneously. Each vertex can only communicate with a subset of graph vertices in each layer. In the first layer, each vertex communicates with vertices whose their communication costs are minimum. The most expensive communications are performed in the last layers. In addition, the proposed model tries to utilize the advantages of both shared-states and message passing communication models simultaneously. By introducing a new kind of intelligence, the proposed system aims to detect useless messages and prevents sending them through network. To cope with the second problem, a new method for partitioning graphs is proposed. This partitioning considers properties of the proposed model in addition to load-balance and edge-cut factors. In this method, graph is partitioned such that each vertex can communicate with more percentage of graph vertices in the lower layers.
Evaluation of the proposed system over real-world graphs showed that ExPregel can reduce the network traffic from 80% to 99% over Pregel, 35-83% over GPS and 64-99% over Blogel. ExPregel speed up over Pregel, GPS, and Blogel are 9x, 20x, and 1.6x, respectively.
Keywords: Distributed Processing, Large-Scale Graph, Locality, Parallelism Degree, Computational Model