skip to main content
10.1145/3477132.3483575acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections

Random Walks on Huge Graphs at Cache Efficiency

Authors Info & Claims
Published:26 October 2021Publication History

ABSTRACT

Data-intensive applications dominated by random accesses to large working sets fail to utilize the computing power of modern processors. Graph random walk, an indispensable workhorse for many important graph processing and learning applications, is one prominent case of such applications. Existing graph random walk systems are currently unable to match the GPU-side node embedding training speed.

This work reveals that existing approaches fail to effectively utilize the modern CPU memory hierarchy, due to the widely held assumption that the inherent randomness in random walks and the skewed nature of graphs render most memory accesses random. We demonstrate that there is actually plenty of spatial and temporal locality to harvest, by careful partitioning, rearranging, and batching of operations. The resulting system, FlashMob, improves both cache and memory bandwidth utilization by making memory accesses more sequential and regular. We also found that a classical combinatorial optimization problem (and its exact pseudo-polynomial solution) can be applied to complex decision making, for accurate yet efficient data/task partitioning. Our comprehensive experiments over diverse graphs show that our system achieves an order of magnitude performance improvement over the fastest existing system. It processes a 58GB real graph at higher per-step speed than the existing system on a 600KB toy graph fitting in the L2 cache.

References

  1. [n.d.]. Laboratory for Web Algorithmcs. http://law.di.unimi.it/datasets.php.Google ScholarGoogle Scholar
  2. Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, and Alexander A Alemi. 2018. Watch your step: Learning node embeddings via graph attention. In Advances in Neural Information Processing Systems. 9180--9190.Google ScholarGoogle Scholar
  3. Alibaba. 2020. Euler. https://github.com/alibaba/eulerGoogle ScholarGoogle Scholar
  4. Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM. 635--644.Google ScholarGoogle Scholar
  5. Ziv Bar-Yossef, Alexander Berg, Steve Chien, Jittat Fakcharoenphol, and Dror Weitz. 2000. Approximating aggregate queries about web pages via random walks. In VLDB. 535--544.Google ScholarGoogle Scholar
  6. Scott Beamer, Krste Asanović, and David Patterson. 2017. Reducing pagerank communication via propagation blocking. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 820--831.Google ScholarGoogle ScholarCross RefCross Ref
  7. Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A large time-aware web graph. SIGIR Forum 42, 2 (2008), 33--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616--1637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management. 891--900.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep neural networks for learning graph representations.. In AAAI, Vol. 16. 1145--1152.Google ScholarGoogle Scholar
  11. Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating embeddings of heterogeneous relational datasets for data integration tasks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1335--1349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sandro Cavallari, Vincent W Zheng, Hongyun Cai, Kevin Chen-Chuan Chang, and Erik Cambria. 2017. Learning community embedding with community detection and node embedding on graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 377--386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. Power-Lyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems. 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xing Chen, MingXi Liu, and GuiYing Yan. 2012. Drug-target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems 8, 7 (2012), 1970--1978.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yan-Hao Chen, Ari B. Hayes, Chi Zhang, Timothy Salmon, and Eddy Z. Zhang. 2018. Locality-aware software throttling for sparse matrix operation on GPUs. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 413--426.Google ScholarGoogle Scholar
  16. Robert M Christley, GL Pinchbeck, Roger G Bowers, Damian Clancy, Nigel P French, Rachel Bennett, and Joanne Turner. 2005. Infection in social networks: using network analysis to identify high-risk individuals. American journal of epidemiology 162, 10 (2005), 1024--1031.Google ScholarGoogle Scholar
  17. Colin Cooper, Sang Hyuk Lee, Tomasz Radzik, and Yiannis Siantos. 2014. Random walks in recommender systems: exact computation and simulations. In Proceedings of the 23rd International Conference on World Wide Web. 811--816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31, 5 (2018), 833--852.Google ScholarGoogle ScholarCross RefCross Ref
  19. Quanyu Dai, Qiang Li, Jian Tang, and Dan Wang. 2018. Adversarial network embedding. In 32nd AAAI Conference on Artificial Intelligence. 2167--2174.Google ScholarGoogle ScholarCross RefCross Ref
  20. Arjun Dasgupta, Gautam Das, and Heikki Mannila. 2007. A random walk approach to sampling hidden databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data. 629--640.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Luc Devroye. 2006. Nonuniform random variate generation. Handbooks in operations research and management science 13 (2006), 83--121.Google ScholarGoogle Scholar
  23. Laxman Dhulipala, Charles McGuffey, Hongbo Kang, Yan Gu, Guy E. Blelloch, Phillip B. Gibbons, and Julian Shun. 2020. Sage: parallel semi-asymmetric graph algorithms for NVRAMs. 13, 9 (2020), 1598--1613.Google ScholarGoogle Scholar
  24. Krzysztof Dudziński and Stanisław Walukiewicz. 1987. Exact methods for the knapsack problem and its generalizations. European Journal of Operational Research 28, 1 (1987), 3--21.Google ScholarGoogle ScholarCross RefCross Ref
  25. Peter Ebbes, Zan Huang, Arvind Rangaswamy, et al. 2010. Subgraph sampling methods for social networks: The good, the bad, and the ugly. Technical Report.Google ScholarGoogle Scholar
  26. Alessandro Epasto and Bryan Perozzi. 2019. Is a single embedding enough? Learning node representations that capture multiple social contexts. In The World Wide Web Conference. 394--404.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. In The World Wide Web Conference. 417--426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, and David Bader. 2020. Traversing large graphs on GPUs with unified memory. Proceedings of the VLDB Endowment 13, 7 (2020), 1119--1133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. 2020. Single machine graph analytics on massive datasets using Intel optane DC persistent memory. In Proceedings of the VLDB Endowment, Vol. 13. 1304--1318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, and Olli Saarikivi. 2021. Distributed training of embeddings using graph analytics. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 973--983.Google ScholarGoogle ScholarCross RefCross Ref
  31. Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. 2010. Walking in facebook: A case study of unbiased sampling of osns. In IEEE INFOCOM. IEEE, 1--9.Google ScholarGoogle Scholar
  32. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In the Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). Hollywood, CA, 17--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78--94.Google ScholarGoogle ScholarCross RefCross Ref
  34. Aditya Grover and Jure Leskovec. 2016. Node2vec on Spark. https://github.com/aditya-grover/node2vec.Google ScholarGoogle Scholar
  35. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855--864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035.Google ScholarGoogle Scholar
  37. William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. IEEE Data(base) Engineering Bulletin 40 (2017), 52--74.Google ScholarGoogle Scholar
  38. Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 77--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Intel. 2009. VTune Performance Analyzer. https://software.intel.com/content/www/us/en/develop/home. html.Google ScholarGoogle Scholar
  40. Intel. 2020. Second Generation Intel Xeon Scalable Processors. https://www. intel.com/content/dam/www/public/us/en/documents/product-briefs/2nd-gen-xeon-scalable-processors-brief-Feb-2020-2.pdf.Google ScholarGoogle Scholar
  41. Anand Padmanabha Iyer, Zaoxing Liu, Xin Jin, Shivaram Venkataraman, Vladimir Braverman, and Ion Stoica. 2018. ASAP: Fast, approximate graph pattern mining at scale. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 745--761.Google ScholarGoogle Scholar
  42. Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using GPUs. In Proceedings of the 16th European Conference on Computer Systems. ACM, 311--326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 538--543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jinyuan Jia, Binghui Wang, and Neil Zhenqiang Gong. 2017. Random walk based fake account detection in online social networks. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 273--284.Google ScholarGoogle ScholarCross RefCross Ref
  45. Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. 2017. A distributed multi-GPU system for fast graph processing. Proceedings of the VLDB Endowment 11, 3 (2017), 297--310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with ROC. Proceedings of Machine Learning and Systems 2 (2020), 187--198.Google ScholarGoogle Scholar
  47. Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon. 2004. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 11 (2004), 1746--1758.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hans Kellerer, Ulrich Pferschy, and David Pisinger. 2004. The multiple-choice knapsack problem. In Knapsack Problems. Springer, 317--347.Google ScholarGoogle Scholar
  49. Maciej Kurant, Athina Markopoulou, and Patrick Thiran. 2010. On the bias of BFS (breadth first search). In 2010 22nd International Teletraffic Congress (lTC 22). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  50. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World Wide Web. ACM, 591--600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Aapo Kyrola. 2013. Drunkardmob: Billions of random walks on just a PC. In Proceedings of the 7th ACM conference on Recommender systems. 257--264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 31--46.Google ScholarGoogle Scholar
  53. Sangkeun Lee, Sang-il Song, Minsuk Kahng, Dongjoo Lee, and Sang-goo Lee. 2011. Random walk based entity ranking on graph for multidimensional recommendation. In Proceedings of the fifth ACM conference on Recommender systems. 93--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  55. LinkedIn. 2017. Random Walks on Large Scale Graphs with Apache Spark. https://www.slideshare.net/databricks/random-walks-on-large-scale-graphs-with-apache-spark-with-min-shen, Last accessed on 2020-12-10.Google ScholarGoogle Scholar
  56. Linux. 2009. perf. https://perf.wiki.kernel.org/.Google ScholarGoogle Scholar
  57. László Lovász et al. 1993. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty 2, 1 (1993), 1--46.Google ScholarGoogle Scholar
  58. Kathy Macropol, Tolga Can, and Ambuj K Singh. 2009. RRW: Repeated random walks on genome-scale protein networks for local cluster discovery. BMC bioinformatics 10, 1 (2009), 283.Google ScholarGoogle Scholar
  59. Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jasmina Malicevic, Subramanya Dulloor, Narayanan Sundaram, Nadathur Satish, Jeff Jackson, and Willy Zwaenepoel. 2015. Exploiting NVM in large-scale graph analytics. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. George Marsaglia et al. 2003. Xorshift rngs. Journal of Statistical Software 8, 14 (2003), 1--6.Google ScholarGoogle Scholar
  62. Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: random walk methods. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing. 123--132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Makoto Matsumoto and Takuji Nishimura. 1998. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS) 8, 1 (1998), 3--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Alan Mislove, Massimiliano Marcon, Krishna P Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. 29--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  66. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. 2019. NetSMF: Large-scale network embedding as sparse matrix factorization. In The World Wide Web Conference. 1509--1520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 459--467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472--488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM.Google ScholarGoogle Scholar
  71. Harold Herbert Seward. 1954. Information sorting in the application of electronic digital computers to business operations. Ph.D. Dissertation. Massachusetts Institute of Technology. Department of Electrical Engineering.Google ScholarGoogle Scholar
  72. Mo Sha, Yuchen Li, Bingsheng He, and Kian-Lee Tan. 2017. Accelerating dynamic graph analytics on GPUs. 11, 1 (2017), 107--120.Google ScholarGoogle Scholar
  73. Mo Sha, Yuchen Li, and Kian-Lee Tan. 2019. GPU-based graph traversal on compressed graphs. In Proceedings of the 2019 International Conference on Management of Data. 775--792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, and Lei Chen. 2020. Memory-aware framework for efficient second-order random walk on large graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1797--1812.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time content recommendations at Twitter. Proceedings of the VLDB Endowment 9, 13 (2016), 1281--1292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Suraj Shetiya, Saravanan Thirumuruganathan, Nick Koudas, and Gautam Das. 2020. Astrid: Accurate selectivity estimation for string predicates using deep learning. Proceedings of the VLDB Endowment 14, 4 (2020), 471--484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Julian Shun and Guy E Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In ACM Sigplan Notices, Vol. 48. ACM, 135--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Prabhakant Sinha and Andris A Zoltners. 1979. The multiple-choice knapsack problem. Operations Research 27, 3 (1979), 503--515.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Don Soltis, Irma Esmer, Adi Yoaz, and Sailesh Kottapalli. 2017. The New Intel Xeon Processor Scalable Family (Formerly Skylake-SP). In IEEE Hot Chips 32 Symposium.Google ScholarGoogle Scholar
  80. Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: An in-memory graph random walk engine.. In Proc. VLDB Endow., Vol. 14. 1992--2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. 1067--1077.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Tencent. 2019. Plato. https://github.com/Tencent/platoGoogle ScholarGoogle Scholar
  83. Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, and AnHai Doan. 2020. Data curation with deep learning.. In EDBT. 277--286.Google ScholarGoogle Scholar
  84. Ke Tu, Peng Cui, Xiao Wang, Philip S Yu, and Wenwu Zhu. 2018. Deep recursive network embedding with regular equivalence. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2357--2366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Alastair J Walker. 1977. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS) 3, 3 (1977), 253--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 1225--1234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. GraphGAN: Graph representation learning with generative adversarial nets. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32. 2508--2515.Google ScholarGoogle ScholarCross RefCross Ref
  88. Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in Alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 839--848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John C. S. Lui. 2020. GraphWalker: An I/O-efficient and resource-friendly graph analytic system for fast and scalable random walks. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 559--571.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. 2017. Community preserving network embedding.. In AAAI, Vol. 17. 203--209.Google ScholarGoogle Scholar
  91. Wanjing Wei, Yangzihao Wang, Pin Gao, Shijie Sun, and Donghai Yu. 2020. A distributed multi-GPU system for large-scale node embedding at Tencent. arXiv preprint arXiv:2005.13789 (2020).Google ScholarGoogle Scholar
  92. Yahoo! 2002. Yahoo! AltaVista Web Page Hyperlink Connectivity Graph. https://webscope.sandbox.yahoo.com/catalog.php?datatype=gGoogle ScholarGoogle Scholar
  93. Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang. 2019. KnightKing: A fast distributed graph random walk engine. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 524--537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, Juncheng Liu, and Sourav S. Bhowmick. 2020. Scaling attributed network embedding to massive graphs. In Proceedings of the VLDB Endowment, Vol. 14. 37--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Dalong Zhang, Xin Huang, Ziqi Liu, Jun Zhou, Zhiyang Hu, Xianzheng Song, Zhibang Ge, Lin Wang, Zhiqiang Zhang, and Yuan Qi. 2020. AGL: A scalable system for industrial-purpose graph machine learning. 13, 12 (2020), 3125--3137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 293--302.Google ScholarGoogle ScholarCross RefCross Ref
  98. Dongyan Zhou, Songjie Niu, and Shimin Chen. 2018. Efficient graph computation for node2vec. arXiv preprint arXiv:1805.00280 (2018).Google ScholarGoogle Scholar
  99. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In the Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 301--316.Google ScholarGoogle Scholar
  100. Xiaojin Zhu, Andrew B Goldberg, Jurgen Van Gael, and David Andrzejewski. 2007. Improving diversity in ranking using absorbing random walks. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. 97--104.Google ScholarGoogle Scholar
  101. Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In USENIX ATC '15 Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference. 375--386.Google ScholarGoogle Scholar
  102. Zhaocheng Zhu, Shizhen Xu, Jian Tang, and Meng Qu. 2019. GraphVite: A high-performance CPU-GPU hybrid system for node embedding. In The World Wide Web Conference. 2494--2504.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Random Walks on Huge Graphs at Cache Efficiency

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
          October 2021
          899 pages
          ISBN:9781450387095
          DOI:10.1145/3477132

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate131of716submissions,18%

          Upcoming Conference

          SOSP '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader