skip to main content
10.1145/3472456.3472462acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing

Published:05 October 2021Publication History

ABSTRACT

The skewed degree distribution of real-world graphs is the main source of poor locality in traversing all edges of the graph, known as Sparse Matrix-Vector (SpMV) Multiplication. Conventional graph traversal methods, such as push and pull, traverse all vertices in the same manner, and we show applying a uniform traversal direction for all edges leads to sub-optimal memory locality, hence poor efficiency. This paper argues that different vertices in power-law graphs have different locality characteristics and the traversal method should be adapted to these characteristics.

To solve this problem, we propose to inspect the number of destination and source vertices in selecting a cache-compatible traversal direction for each type of vertex. We introduce in-Hub Temporal Locality (iHTL), a structure-aware SpMV that combines push and pull in one graph traversal, but for different vertex types. iHTL exploits temporal locality by traversing incoming edges to in-hubs in push direction, while processing other edges in pull direction.

The evaluation shows iHTL is 1.5 × - 2.4 × faster than pull and 4.8 × - 9.5 × faster than push in state-of-the-art graph processing frameworks such as GraphGrind, GraphIt and Galois. More importantly, iHTL is 1.3 × - 1.5 × faster than pull traversal of state-of-the-art locality optimizing reordering algorithms such as SlashBurn, GOrder, and Rabbit-Order.

References

  1. Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17(1997), 354–364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Junya Arai, Hiroaki Shiokawa, Takeshi Yamamuro, Makoto Onizuka, and Sotetsu Iwamura. 2016. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 22–31.Google ScholarGoogle Scholar
  3. Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing Breadth-first Search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC ’12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 12, 10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Besta, F. Marending, E. Solomonik, and T. Hoefler. 2017. SlimSell: A Vectorizable Graph Representation for Breadth-First Search. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 32–41.Google ScholarGoogle Scholar
  5. Maciej Besta, Michał Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (Washington, DC, USA) (HPDC ’17). Association for Computing Machinery, New York, NY, USA, 93–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. J. ACM 46, 5 (Sept. 1999), 720–748.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. 2004. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software: Practice & Experience 34, 8 (2004), 711–726.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. 2014. BUbiNG: Massive Crawling for the Masses. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 227–228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution Coordinate-free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW ’11). ACM, New York, NY, USA, 587–596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW ’04). ACM, New York, NY, USA, 595–602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks 30, 1-7 (April 1998), 107–117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anna D. Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature Communications 10, 1 (Mar 2019).Google ScholarGoogle ScholarCross RefCross Ref
  13. Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM. Washington DC, USA.Google ScholarGoogle Scholar
  14. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (Bordeaux, France) (EuroSys ’15). ACM, New York, NY, USA, Article 1, 15 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Charles L Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  16. Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2020. Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory. Proc. VLDB Endow. 13, 8 (April 2020), 1304–1318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making Pull-based Graph Processing Performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP ’18). ACM, New York, NY, USA, 246–260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Irigoin and R. Triolet. 1988. Supernode Partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) (POPL ’88). Association for Computing Machinery, New York, NY, USA, 319–329.Google ScholarGoogle Scholar
  19. U. Kang, D. H. Chau, and C. Faloutsos. 2011. Mining large graphs: Algorithms, inference, and discoveries. In 2011 IEEE 27th International Conference on Data Engineering. 243–254.Google ScholarGoogle Scholar
  20. Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604–632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mohsen Koohi Esfahani, Peter Kilpatrick, and Hans Vandierendonck. 2021. How Do Graph Relabeling Algorithms Improve Memory Locality?. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE Computer Society, 84–86.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jérôme Kunegis. 2013. KONECT – The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343–1350.Google ScholarGoogle Scholar
  23. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 591–600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yongsub Lim, U Kang, and Christos Faloutsos. 2014. SlashBurn: Graph Compression and Mining beyond Caveman Communities. IEEE Transactions on Knowledge and Data Engineering 26, 12 (Dec 2014), 3077–3089.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs(PPoPP ’19). Association for Computing Machinery, New York, NY, USA, 201–213.Google ScholarGoogle Scholar
  26. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (San Diego, California, USA) (IMC ’07). Association for Computing Machinery, New York, NY, USA, 29–42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sharan Narang, Gregory F. Diamos, Shubho Sengupta, and Erich Elsen. 2017. Exploring Sparsity in Recurrent Neural Networks. CoRR abs/1704.05119(2017). arxiv:1704.05119Google ScholarGoogle Scholar
  28. Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI.Google ScholarGoogle Scholar
  29. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP ’13). ACM, New York, NY, USA, 472–488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.Google ScholarGoogle Scholar
  31. Hiroaki Shiokawa, Tomokatsu Takahashi, and Hiroyuki Kitagawa. 2018. ScaleSCAN: Scalable Density-Based Graph Clustering. In Database and Expert Systems Applications, Sven Hartmann, Hui Ma, Günther Pernul, and Roland R. Wagner (Eds.). Springer International Publishing, Cham, 18–34.Google ScholarGoogle Scholar
  32. Friendster social network. [n.d.]. Friendster: The online gaming social network. archive.org/details/friendster-dataset-201107.Google ScholarGoogle Scholar
  33. Bor-Yiing Su, Tasneem G. Brutch, and Kurt Keutzer. 2010. Parallel BFS graph traversal on images using structured grid. In 2010 IEEE International Conference on Image Processing. 4489–4492.Google ScholarGoogle ScholarCross RefCross Ref
  34. Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning. In 2017 46th International Conference on Parallel Processing (ICPP). 181–190.Google ScholarGoogle ScholarCross RefCross Ref
  35. Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: Addressing Load Imbalance of Graph Partitioning. In Proceedings of the International Conference on Supercomputing (Chicago, Illinois) (ICS ’17). ACM, New York, NY, USA, Article 16, 10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2018. VEBO: A Vertex- and Edge-Balanced Ordering Heuristic to Load Balance Parallel Graph Processing. CoRR abs/1806.06576(2018). arxiv:1806.06576Google ScholarGoogle Scholar
  37. Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias S. Müller, Michael M. Resch, Alexander Schulz, and Wolfgang E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 157–173.Google ScholarGoogle Scholar
  38. Hans Vandierendonck. 2020. Graptor: Efficient Pull and Push Style Vectorized Graph Processing. In Proceedings of the 34th ACM International Conference on Supercomputing (Barcelona, Spain) (ICS ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 13 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, and Xiaodong Zhang. 2019. SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP ’19). Association for Computing Machinery, New York, NY, USA, 38–52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J. Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs/1909.01315(2019). arxiv:1909.01315Google ScholarGoogle Scholar
  41. Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup Graph Processing by Graph Ordering. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD ’16). ACM, NewYork, NY, USA, 1813–1828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger. 2007. SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Jose, California, USA) (KDD ’07). ACM, New York, NY, USA, 824–833.Google ScholarGoogle Scholar
  43. Serif Yesil, Azin Heidarshenas, Adam Morrison, and Josep Torrellas. 2020. Speeding up SpMV for Power-Law Graph Analytics by Enhancing Locality and Vectorization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC ’20). IEEE Press, Article 86, 15 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware Graph-structured Analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Francisco, CA, USA) (PPoPP 2015). ACM, New York, NY, USA, 183–193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Matei Zaharia, and Saman P. Amarasinghe. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302.Google ScholarGoogle ScholarCross RefCross Ref
  46. Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-Performance Graph DSL. Proc. ACM Program. Lang. 2, OOPSLA, Article 121 (Oct. 2018), 30 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
    August 2021
    927 pages
    ISBN:9781450390682
    DOI:10.1145/3472456

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 5 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate91of313submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format