DiterGraph: Toward I/O-Efficient Incremental Computation over Large Graphs with Billion Edges
Y Du, Z Wang, N Wang, L Xie… - 2021 7th International …, 2021 - ieeexplore.ieee.org
Y Du, Z Wang, N Wang, L Xie, Z Wei
2021 7th International Conference on Big Data Computing and …, 2021•ieeexplore.ieee.orgThe growing demand for iterative computation over large-scale graphs has attracted a lot of
enthusiasm. Distributed-disk systems can accommodate the high-level scalability
requirement as graphs grow in size, but the computation is greatly expensive due to a large
number of communications and a high frequency of random data-accesses. Alleviating the
two limiting factors pose great challenges for graph partitioning, disk-oriented data
management and the iterative mechanism. This paper derives insights from the natural …
enthusiasm. Distributed-disk systems can accommodate the high-level scalability
requirement as graphs grow in size, but the computation is greatly expensive due to a large
number of communications and a high frequency of random data-accesses. Alleviating the
two limiting factors pose great challenges for graph partitioning, disk-oriented data
management and the iterative mechanism. This paper derives insights from the natural …
The growing demand for iterative computation over large-scale graphs has attracted a lot of enthusiasm. Distributed-disk systems can accommodate the high-level scalability requirement as graphs grow in size, but the computation is greatly expensive due to a large number of communications and a high frequency of random data-accesses. Alleviating the two limiting factors pose great challenges for graph partitioning, disk-oriented data management and the iterative mechanism. This paper derives insights from the natural locality of raw graphs and then proposes a lightweight partitioning algorithm GPNL with the goal of balancing load and accelerating communication. Accordingly, a hybrid index RC-Index is proposed to improve the I/O-efficiency by reducing disk-accesses for graph data and message data. We also introduce an across-iteration mechanism (AIM) based on the extended BSP model, and then design two policies AIMP and AIMC to prune the message scale and accelerate the message-spreading respectively. Comprehensive experiments versus the state-of-the-art solutions demonstrate significant performance gains over a broad spectrum of real-world and synthetic graphs with up to billion edges.
ieeexplore.ieee.org