skip to main content
10.1145/3458817.3476156acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs

Published: 13 November 2021 Publication History

Abstract

Because of cost, non-volatile memory NVDIMMs such as Intel Optane are attractive in single-node big-memory systems. We evaluate performance and cost trade-offs when using Optane as volatile memory for huge-graph analytics. We study two scalable graph applications with different work locality, access patterns, and parallelism. We evaluate single and partitioned address spaces---Memory and AppDirect modes---and compare with distributed executions on GPU-accelerated and CPU-based supercomputers.
We show that AppDirect can perform and scale better than Memory for the largest working sets (12%), even when dominated by irregular access patterns, if most accesses are NUMA-local and Optane accesses are frequently reads. Surprisingly, between Memory and AppDirect, processor-cache performance can change due to line invalidations; updates to the caching policy (via non-temporal hints) can make a 25% improvement. We observe that single-node graph analytics frequently has >4--10× cost/performance advantages over distributed-memory executions on supercomputers.

Supplementary Material

MP4 File (Single-Node Partitioned-Memory for Huge Graph Analytics_ Cost and Performance Trade-Offs.mp4.mp4)
Presentation video

References

[1]
Lluc Alvarez, Marc Casas, Jesus Labarta, Eduard Ayguade, Mateo Valero, and Miquel Moreto. Runtime-guided management of stacked dram memories in task parallel programs. In Proceedings of the 2018 International Conference on Supercomputing, ICS '18, pages 218--228, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450357838. URL https://doi.org/10.1145/3205289.3205312.
[2]
Andrea Arcangeli. Transparent hugepage support. In KVM forum, volume 9, 2010.
[3]
Scott Beamer, Krste Asanović, and David Patterson. The gap benchmark suite. arXiv preprint arXiv:1508.03619, 2015.
[4]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. URL http://stacks.iop.org/1742-5468/2008/i=10/a=P10008.
[5]
Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press.
[6]
Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, and Dorothea Wagner. On modularity clustering. IEEE transactions on knowledge and data engineering, 20(2):172--188, 2007.
[7]
Steffen Christgau and Thomas Steinke. Leveraging a heterogenous memory system for a legacy fortran code: The interplay of storage class memory, dram and os. In Proc. of the 2020 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020.
[8]
Laxman Dhulipala, Jessica Shi, Tom Tseng, Guy E Blelloch, and Julian Shun. The graph based benchmark suite (gbbs). In Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), pages 1--8, 2020.
[9]
Thaleia Dimitra Doudali, Daniel Zahka, and Ada Gavrilovska. Cori: Dancing to the right beat of periodic data movements over hybrid memory systems. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 350--359, 2021.
[10]
Santo Fortunato. Community detection in graphs. Physics reports, 486(3--5): 75--174, 2010.
[11]
Sayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, and Assefaw H Gebremedhin. minivite: A graph analytics benchmarking tool for massively parallel systems. In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 51--56. IEEE, 2018.
[12]
Sayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Hao Lu, Daniel Chavarrià-Miranda, Arif Khan, and Assefaw Gebremedhin. Distributed louvain algorithm for graph community detection. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 885--895, 2018.
[13]
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. Single machine graph analytics on massive datasets using intel optane dc persistent memory. Proc. VLDB Endow., 13(8):1304--1318, April 2020. ISSN 2150-8097. URL https://doi.org/10.14778/3389133.3389145.
[14]
Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2):17--28, 2013.
[15]
Satoshi Imamura and Eiji Yoshida. The analysis of inter-process interference on a hybrid memory system. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, pages 1--4, New York, NY, USA, 2020. Association for Computing Machinery.
[16]
Intel. Intel Debuts Cooper Lake Xeons for 4- and 8-Socket Platforms. https://www.hpcwire.com/2020/06/18/intel-debuts-cooper-lake-xeons-for-4-8-socket-platforms/, June 2020.
[17]
Intel. Taking a deep dive into Cooper Lake xeon sp processors. https://www.nextplatform.com/2020/06/18/taking-a-deep-dive-into-cooper-lake-xeon-sp-processors/, June 2020.
[18]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714, 2019.
[19]
Mark S Johnstone and Paul R Wilson. The memory fragmentation problem: Solved? ACM Sigplan Notices, 34(3):26--36, 1998.
[20]
David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003, pages 137--146. ACM, 2003.
[21]
Kenneth C Knowlton. A fast storage allocator. Communications of the ACM, 8 (10):623--624, 1965.
[22]
Scott P Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A Davis, Matthew Henderson, Yifan Hu, and Read Sandstrom. The suitesparse matrix collection website interface. Journal of Open Source Software, 4(35):1244, 2019.
[23]
Andrew Lenharth, Donald Nguyen, and Keshav Pingali. Parallel graph analytics. Communications of the ACM, 59(5):78--87, 2016.
[24]
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[25]
Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne M. VanBriesen, and Natalie S. Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12--15, 2007, pages 420--429. ACM, 2007.
[26]
Jiawen Liu, Dong Li, Roberto Gioiosa, and Jiajia Li. Athena: High-performance sparse tensor contraction sequence on heterogeneous memory. In Proceedings of the ACM International Conference on Supercomputing, ICS '21, pages 190--202, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383356. URL https://doi.org/10.1145/3447818.3460355.
[27]
Hao Lu, Mahantesh Halappanavar, and Ananth Kalyanaraman. Parallel heuristics for scalable community detection. Parallel Computing, 47:19--37, 2015.
[28]
Andrew Lumsdaine, Douglas P. Gregor, Bruce Hendrickson, and Jonathan W. Berry. Challenges in parallel graph processing. Parallel Process. Lett., 17(1): 5--20, 2007. URL https://doi.org/10.1142/S0129626407002843.
[29]
John D McCalpin. Stream benchmark. Link: www.cs.virginia.edu/stream/ref.html# what, 22, 1995.
[30]
Chris Mellor. Why Micron fears Optane will eat its server DRAM lunch. https://blocksandfiles.com/2019/09/29/optane-pricing-micron-dram-headache/, September 2019.
[31]
MemVerge. Memory Machine. https://memverge.com/wp-content/uploads/2020/10/Data-Sheet_Memory-Machine.pdf, may 2021.
[32]
Marco Minutoli, Mahantesh Halappanavar, Ananth Kalyanaraman, Arun Sathanur, Ryan Mcclure, and Jason McDermott. Fast and scalable implementations of influence maximization algorithms. In 2019 IEEE International Conference on Cluster Computing (CLUSTER), pages 1--12. IEEE, 2019.
[33]
Marco Minutoli, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, and Ananth Kalyanaraman. curipples: influence maximization on multi-gpu systems. In Proceedings of the 34th ACM International Conference on Supercomputing, pages 1--11, 2020.
[34]
Marco Minutoli, Mahantesh Halappanavar, and Ananth Kalyanaraman.pnnl/ripples, 2021. URL https://github.com/pnnl/ripples.
[35]
Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.
[36]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 456--471, 2013.
[37]
Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic processes in complex networks. Reviews of modern physics, 87(3):925, 2015.
[38]
Onkar Patil, Latchesar Ionkov, Jason Lee, Frank Mueller, and Michael Lang. Performance characterization of a dram-nvm hybrid memory architecture for hpc applications using intel optane dc persistent memory modules. In Proceedings of the International Symposium on Memory Systems, MEMSYS '19, pages 288--303, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450372060. URL https://doi.org/10.1145/3357526.3357541.
[39]
I. Peng, K. Wu, J. Ren, D. Li, and M. Gokhale. Demystifying the performance of hpc scientific applications on nvm-based memory systems. In 2020 IEEE International Parallel and Distributed Processing Symposium, pages 916--925, 2020.
[40]
Ivy B. Peng, Maya B. Gokhale, and Eric W. Green. System evaluation of the intel optane byte-addressable nvm. In Proceedings of the International Symposium on Memory Systems, MEMSYS '19, pages 304--315, New York, NY, USA, 2019. Association for Computing Machinery.
[41]
David F Richards, Omar Aaziz, Jeanine Cook, Hal Finkel, Brian Homerding, Peter McCorquodale, Tiffany Mintz, Shirley Moore, Abhinacv Bhatele, and Robert Pavel. Fy18 proxy app suite release. milestone report for the ecp proxy app project. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2018.
[42]
Rami Rosen. Resource management: Linux kernel namespaces and cgroups. Haifux, May, 186:70, 2013.
[43]
Steve Scargall. Volatile use of persistent memory. In Programming Persistent Memory, pages 155--186. Springer, 2020.
[44]
Anil Shanbhag, Nesime Tatbul, David Cohen, and Samuel Madden. Large-scale in-memory analytics on intel optanetm dc persistent memory. In Proceedings of the 16th International Workshop on Data Management on New Hardware, DaMoN '20, New York, NY, USA, 2020. Association for Computing Machinery.
[45]
Julian Shun and Guy E Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 135--146, 2013.
[46]
Youze Tang, Xiaokui Xiao, and Yanchen Shi. Influence maximization: near-optimal time complexity meets practical efficiency. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 75--86. ACM, 2014.
[47]
Youze Tang, Yanchen Shi, and Xiaokui Xiao. Influence maximization in near-linear time: A martingale approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 1539--1554. ACM, 2015.
[48]
Alexander van Renen, Lukas Vogel, Viktor Leis, Thomas Neumann, and Alfons Kemper. Persistent memory i/o primitives. In Proceedings of the 15th International Workshop on Data Management on New Hardware, DaMoN'19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450368018. URL https://doi.org/10.1145/3329785.3329930.
[49]
Vish Viswanathan, Karthik Kumar, T Willhalm, P Lu, B Filipiak, and S Sakthivelu. Intel memory latency checker. Intel Corporation, 2013.
[50]
Daniel Waddington, Mark Kunitomi, Clem Dickey, Samyukta Rao, Amir Abboud, and Jantz Tran. Evaluation of intel 3d-xpoint nvdimm technology for memoryintensive genomic workloads. In Proceedings of the International Symposium on Memory Systems, pages 277--287, 2019.
[51]
Michèle Weiland, Holger Brunst, Tiago Quintino, Nick Johnson, Olivier Iffrig, Simon Smart, Christian Herold, Antonino Bonanni, Adrian Jackson, and Mark Parsons. An early evaluation of intel's optane dc persistent memory module and its impact on high-performance scientific applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450362290. URL https://doi.org/10.1145/3295500.3356159.
[52]
Kai Wu, Frank Ober, Shari Hamlin, and Dong Li. Early evaluation of intel optane non-volatile memory with hpc i/o workloads. arXiv preprint arXiv:1708.02199, 2017.
[53]
Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma, and Dong Li. Md-hm: Memoization-based molecular dynamics simulations on big memory system. In Proceedings of the ACM International Conference on Supercomputing, ICS '21, pages 215--226, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383356. URL https://doi.org/10.1145/3447818.3460365.
[54]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 169--182, Santa Clara, CA, February 2020. USENIX Association. ISBN 978-1-939133-12-0. URL https://www.usenix.org/conference/fast20/presentation/yang.
[55]
P. Zardoshti, M. Spear, A. Vosoughi, and G. Swart. Understanding and improving persistent transactions on optane dc memory. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 348--357, 2020.
[56]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. Graphit: A high-performance dsl for graph analytics. arXiv preprint arXiv:1805.00923, 2018.

Cited By

View all
  • (2024)Application and Challenges of In-Memory Computing in Power System Simulation2024 3rd International Conference on Energy and Electrical Power Systems (ICEEPS)10.1109/ICEEPS62542.2024.10693073(864-868)Online publication date: 14-Jul-2024
  • (2023)Workload-Aware Log-Structured Merge Key-Value Store for NVM-SSD Hybrid Storage2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00171(2207-2219)Online publication date: Apr-2023
  • (2022)Electromagnetic Simulations with 3D FEM and Intel Optane Persistent Memory2022 24th International Microwave and Radar Conference (MIKON)10.23919/MIKON54314.2022.9924749(1-5)Online publication date: 12-Sep-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph analytics
  2. non-volatile memory
  3. performance evaluation

Qualifiers

  • Research-article

Conference

SC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Application and Challenges of In-Memory Computing in Power System Simulation2024 3rd International Conference on Energy and Electrical Power Systems (ICEEPS)10.1109/ICEEPS62542.2024.10693073(864-868)Online publication date: 14-Jul-2024
  • (2023)Workload-Aware Log-Structured Merge Key-Value Store for NVM-SSD Hybrid Storage2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00171(2207-2219)Online publication date: Apr-2023
  • (2022)Electromagnetic Simulations with 3D FEM and Intel Optane Persistent Memory2022 24th International Microwave and Radar Conference (MIKON)10.23919/MIKON54314.2022.9924749(1-5)Online publication date: 12-Sep-2022
  • (2022)HBMaxProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569647(412-425)Online publication date: 8-Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media