skip to main content
10.1145/3297858.3304029acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

DiGraph: An Efficient Path-based Iterative Directed Graph Processing System on Multiple GPUs

Published: 04 April 2019 Publication History

Abstract

Many systems are recently proposed for large-scale iterative graph analytics on a single machine with GPU accelerators. Despite of many research efforts, for iterative directed graph processing over GPUs, existing solutions suffer from slow convergence speed and high data access cost, because many vertices are ineffectively reprocessed for lots of rounds so as to update their states according to other active vertices regardless of their dependencies. In this paper, we propose a novel and efficient iterative directed graph processing system on a machine with the support of multiple GPUs. Compared with existing systems, the unique feature of our system is that it takes advantage of the dependencies between vertices in three novel ways. First, it represents a directed graph into a set of disjoint hot/cold directed paths and takes the path as the basic parallel processing unit, so as to help efficient vertex state propagation along the paths over GPUs for faster convergence speed and higher utilization ratio of the loaded data. Second, it tries to dispatch the paths to GPUs for parallel processing according to the topological order of the dependency graph of them. Many paths then converge along such an order after processing them for exactly once, getting lower reprocessing overhead. Third, a path scheduling strategy is further developed on each streaming multiprocessor to enable the privileged execution of the paths (e.g., the hot paths) with greater impacts on vertex state propagation for shorter convergence time according to vertex dependency. Experimental results show that our approach speeds up iterative directed graph processing by up to 3.54 times in comparison with the state-of-the-art systems.

References

[1]
Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Xuehai Qian, Kang Chen, and Weimin Zheng. 2017. Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O. In Proceedings of the 2017 USENIX Annual Technical Conference. 125--137.
[2]
Shumeet Baluja, Rohan Seth, D. Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, and Mohamed Aly. 2008. Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph. In Proceedings of the 17th International Conference on World Wide Web. 895--904.
[3]
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In Proceedings of the 22nd ACM Sigplan Symposium on Principles and Practice of Parallel Programming. 235--248.
[4]
Jiefeng Cheng, Qin Liu, and Zhenguo Li. 2015. VENUS: Vertex-centric streamlined graph computation on a single PC. In Proceedings of the 2015 IEEE International Conference on Data Engineering. 124--134.
[5]
Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. NXgraph: An efficient graph processing system on a single machine. In Proceedings of the 2016 IEEE International Conference on Data Engineering. 409--420.
[6]
Abdullah Gharaibeh, Lauro Beltro Costa, Elizeu Santos-Neto, and Matei Ripeanu. 2012. A yoke of oxen and a thousand chickens for heavy lifting graph processing. In Proceedings of the 21th International Conference on Parallel Architectures and Compilation Techniques. 345--354.
[7]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 17--30.
[8]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 599--613.
[9]
Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making pull-based graph processing performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 246--260.
[10]
Wei Han, Daniel Mawhirter, Bo Wu, and Matthew Buland. 2017. Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques. 233--245.
[11]
Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining. 538--543.
[12]
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat Mccormick, Mattan Erez, and Alex Aiken. 2017. A Distributed Multi-GPU System for Fast Graph Processing. Proceedings of the VLDB Endowment, Vol. 11, 3 (2017), 297--310.
[13]
Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo. 2015. K-core decomposition of large networks on a single PC. Proceedings of the VLDB Endowment, Vol. 9, 1 (2015), 13--23.
[14]
Farzad Khorasani, Rajiv Gupta, and Laxmi N. Bhuyan. 2015. Scalable SIMD-Efficient Graph Processing on GPUs. In Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques. 39--50.
[15]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. 239--252.
[16]
Min Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. 447--461.
[17]
Seongyun Ko and Wook Shin Han. 2018. TurboGraph+: A Scalable and Fast Graph Analytics System. In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data. 395--410.
[18]
Amlan Kusum, Keval Vora, Rajiv Gupta, and Iulian Neamtiu. 2016. Efficient Processing of Large Graphs via Input Reduction. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 245--257.
[19]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31--46.
[20]
Xue Li, Mingxing Zhang, Kang Chen, and Yongwei Wu. 2018. ReGraph: A Graph Processing Framework That Alternately Shrinks and Repartitions the Graph. In Proceedings of the 2018 International Conference on Supercomputing. 172--183.
[21]
David Liben-Nowell and Jon Kleinberg. 2007. The Link-prediction Problem for Social Networks. Journal of the American Society for Information Science and Technology, Vol. 58, 7 (2007), 1019--1031.
[22]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, Vol. 5, 8 (2012), 716--727.
[23]
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment, Vol. 8, 3 (2014), 281--292.
[24]
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication. In Proceedings of the 2017 USENIX Annual Technical Conference. 195--207.
[25]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of the 12th European Conference on Computer Systems. 527--543.
[26]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 135--146.
[27]
Ulrich Meyer. 2001. Single-source shortest-paths on arbitrary directed graphs in linear average-case time. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms. 797--806.
[28]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1998. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford Digital Library Technologies Project.
[29]
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU Graph Analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium. 479--490.
[30]
Bryan Perozzi and Leman Akoglu. 2014. Focused clustering and outlier detection in large attributed graphs. In Proceedings of the 2014 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1346--1355.
[31]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. 410--424.
[32]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. 472--488.
[33]
Amir H. N. Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming Irregular Graphs for GPU-friendly Graph Processing. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 622--636.
[34]
Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. 2014. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 979--990.
[35]
Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, and Karsten Schwan. 2015. GraphReduce: processing large-scale graphs on accelerator-based systems. In Proceedings of the 2015 International Conference for High performance Computing, Networking, Storage and Analysis. 28:1--28:12.
[36]
Mo Sha, Yuchen Li, Bingsheng He, and Kian-Lee Tan. 2017. Accelerating Dynamic Graph Analytics on GPUs. Proceedings of the VLDB Endowment, Vol. 11, 1 (2017), 107--120.
[37]
Shuang Song, Xu Liu, Qinzhe Wu, Andreas Gerstlauer, Tao Li, and Lizy K. John. 2018. Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction. Proceedings of the VLDB Endowment, Vol. 12, 2 (2018), 154--168.
[38]
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: addressing load imbalance of graph partitioning. In Proceedings of the 2017 International Conference on Supercomputing. 16:1--16:10.
[39]
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput., Vol. 1, 2 (1972), 146--160.
[40]
Shiv Verma, Luke M. Leslie, Yosub Shin, and Indranil Gupta. 2017. An experimental comparison of partitioning strategies in distributed graph processing. Proceedings of the VLDB Endowment, Vol. 10, 5 (2017), 493--504.
[41]
Keval Vora, Chen Tian, Rajiv Gupta, and Ziang Hu. 2017. CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 223--236.
[42]
Keval Vora, Guoqing Xu, and Rajiv Gupta. 2016. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In Proceedings of the 2016 USENIX Annual Technical Conference. 507--522.
[43]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 389--404.
[44]
Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Ying Liu, and Xiaobing Feng. 2018. Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 276--289.
[45]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: a high-performance graph processing library on the GPU. In Proceedings of the 21st ACM Sigplan Symposium on Principles and Practice of Parallel Programming. 11:1--11:12.
[46]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup Graph Processing by Graph Ordering. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. 1813--1828.
[47]
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In Proceedings of the 2015 ACM Sigplan Symposium on Principles and Practice of Parallel Programming. 194--204.
[48]
Pingpeng Yuan, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. 2014. Fast Iterative Graph Computation: A Path Centric Approach. In Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis. 401--412.
[49]
Mingxing Zhang, Yongwei Wu, Youwei Zhuo, Xuehai Qian, Chengying Huan, and Kang Chen. 2018. Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. 608--621.
[50]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Guang Tan, and Bing Bing Zhou. 2017. HotGraph: Efficient Asynchronous Processing for Real-world Graphs. IEEE Trans. Comput., Vol. 66, 5 (2017), 799--809.
[51]
Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, and Bing Bing Zhou. 2018. FBSGraph: Accelerating Asynchronous Graph Processing via Forward and Backward Sweeping. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 5 (2018), 895--907.
[52]
Yu Zhang, Xiaofei Liao, Xiang Shi, Hai Jin, and Bingsheng He. 2018. Efficient Disk-based Directed Graph Processing: A Strongly Connected Component Approach. IEEE Transactions on Parallel and Distributed Systems, Vol. 29, 4 (2018), 830--842.
[53]
Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified Graph Processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 6 (2014), 1543--1552.
[54]
Amelie Chi Zhou, Shadi Ibrahim, and Bingsheng He. 2017. On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems. 1397--1407.
[55]
Junfeng Zhou, Shijie Zhou, Jeffrey Xu Yu, Hao Wei, Ziyang Chen, and Xian Tang. 2017. DAG Reduction: Fast Answering Reachability Queries. In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data. 375--390.
[56]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the 2015 USENIX Annual Technical Conference. 375--386.

Cited By

View all
  • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
  • (2024)Evaluating the Soft Error Resilience of Graph Applications on GPGPUs2024 IEEE 10th Conference on Big Data Security on Cloud (BigDataSecurity)10.1109/BigDataSecurity62737.2024.00022(84-89)Online publication date: 10-May-2024
  • (2024)Graph Processing Scheme Using GPU With Value-Driven Differential SchedulingIEEE Access10.1109/ACCESS.2024.337451312(41590-41600)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. DiGraph: An Efficient Path-based Iterative Directed Graph Processing System on Multiple GPUs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
      April 2019
      1126 pages
      ISBN:9781450362405
      DOI:10.1145/3297858
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 April 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU
      2. convergence speed
      3. data access cost
      4. iterative directed graph processing
      5. warp scheduling

      Qualifiers

      • Research-article

      Conference

      ASPLOS '19

      Acceptance Rates

      ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)70
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PMGraph: Accelerating Concurrent Graph Queries over Streaming GraphsACM Transactions on Architecture and Code Optimization10.1145/368933721:4(1-25)Online publication date: 20-Nov-2024
      • (2024)Evaluating the Soft Error Resilience of Graph Applications on GPGPUs2024 IEEE 10th Conference on Big Data Security on Cloud (BigDataSecurity)10.1109/BigDataSecurity62737.2024.00022(84-89)Online publication date: 10-May-2024
      • (2024)Graph Processing Scheme Using GPU With Value-Driven Differential SchedulingIEEE Access10.1109/ACCESS.2024.337451312(41590-41600)Online publication date: 2024
      • (2024)Towards High-Performance Graph Processing: From a Hardware/Software Co-Design PerspectiveJournal of Computer Science and Technology10.1007/s11390-024-4150-039:2(245-266)Online publication date: 1-Mar-2024
      • (2024)A graph pattern mining framework for large graphs on GPUThe VLDB Journal10.1007/s00778-024-00883-834:1Online publication date: 5-Dec-2024
      • (2023)Incremental Connected Component Detection for Graph Streams on GPUElectronics10.3390/electronics1206146512:6(1465)Online publication date: 20-Mar-2023
      • (2023)An efficient hardware accelerator for monotonic graph algorithms on dynamic directed graphsSCIENTIA SINICA Informationis10.1360/SSI-2022-019153:8(1575)Online publication date: 15-Aug-2023
      • (2023)HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00049(558-571)Online publication date: Apr-2023
      • (2023)Efficient Multi-GPU Graph Processing with Remote Work Stealing2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00022(191-204)Online publication date: Apr-2023
      • (2023)SaGraph: A Similarity-aware Hardware Accelerator for Temporal Graph Processing2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247966(1-6)Online publication date: 9-Jul-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media