research-article

Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs

Authors:

Nathan R. Tallent,

Marco Minutoli,

Mahantesh Halappanavar,

Ananth KalyanaramanAuthors Info & Claims

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 55, Pages 1 - 14

https://doi.org/10.1145/3458817.3476156

Published: 13 November 2021 Publication History

Abstract

Because of cost, non-volatile memory NVDIMMs such as Intel Optane are attractive in single-node big-memory systems. We evaluate performance and cost trade-offs when using Optane as volatile memory for huge-graph analytics. We study two scalable graph applications with different work locality, access patterns, and parallelism. We evaluate single and partitioned address spaces---Memory and AppDirect modes---and compare with distributed executions on GPU-accelerated and CPU-based supercomputers.

We show that AppDirect can perform and scale better than Memory for the largest working sets (12%), even when dominated by irregular access patterns, if most accesses are NUMA-local and Optane accesses are frequently reads. Surprisingly, between Memory and AppDirect, processor-cache performance can change due to line invalidations; updates to the caching policy (via non-temporal hints) can make a 25% improvement. We observe that single-node graph analytics frequently has >4--10× cost/performance advantages over distributed-memory executions on supercomputers.

Supplementary Material

MP4 File (Single-Node Partitioned-Memory for Huge Graph Analytics_ Cost and Performance Trade-Offs.mp4.mp4)

Presentation video

Download
395.74 MB

References

[1]

Lluc Alvarez, Marc Casas, Jesus Labarta, Eduard Ayguade, Mateo Valero, and Miquel Moreto. Runtime-guided management of stacked dram memories in task parallel programs. In Proceedings of the 2018 International Conference on Supercomputing, ICS '18, pages 218--228, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450357838. URL https://doi.org/10.1145/3205289.3205312.

Digital Library

[2]

Andrea Arcangeli. Transparent hugepage support. In KVM forum, volume 9, 2010.

[3]

Scott Beamer, Krste Asanović, and David Patterson. The gap benchmark suite. arXiv preprint arXiv:1508.03619, 2015.

[4]

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. URL http://stacks.iop.org/1742-5468/2008/i=10/a=P10008.

[5]

Paolo Boldi and Sebastiano Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press.

Digital Library

[6]

Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, and Dorothea Wagner. On modularity clustering. IEEE transactions on knowledge and data engineering, 20(2):172--188, 2007.

[7]

Steffen Christgau and Thomas Steinke. Leveraging a heterogenous memory system for a legacy fortran code: The interplay of storage class memory, dram and os. In Proc. of the 2020 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020.

[8]

Laxman Dhulipala, Jessica Shi, Tom Tseng, Guy E Blelloch, and Julian Shun. The graph based benchmark suite (gbbs). In Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), pages 1--8, 2020.

Digital Library

[9]

Thaleia Dimitra Doudali, Daniel Zahka, and Ada Gavrilovska. Cori: Dancing to the right beat of periodic data movements over hybrid memory systems. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 350--359, 2021.

[10]

Santo Fortunato. Community detection in graphs. Physics reports, 486(3--5): 75--174, 2010.

[11]

Sayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, and Assefaw H Gebremedhin. minivite: A graph analytics benchmarking tool for massively parallel systems. In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 51--56. IEEE, 2018.

[12]

Sayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Hao Lu, Daniel Chavarrià-Miranda, Arif Khan, and Assefaw Gebremedhin. Distributed louvain algorithm for graph community detection. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 885--895, 2018.

[13]

Gurbinder Gill, Roshan Dathathri, Loc Hoang, Ramesh Peri, and Keshav Pingali. Single machine graph analytics on massive datasets using intel optane dc persistent memory. Proc. VLDB Endow., 13(8):1304--1318, April 2020. ISSN 2150-8097. URL https://doi.org/10.14778/3389133.3389145.

Digital Library

[14]

Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A Zighed. Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2):17--28, 2013.

Digital Library

[15]

Satoshi Imamura and Eiji Yoshida. The analysis of inter-process interference on a hybrid memory system. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, pages 1--4, New York, NY, USA, 2020. Association for Computing Machinery.

Digital Library

[16]

Intel. Intel Debuts Cooper Lake Xeons for 4- and 8-Socket Platforms. https://www.hpcwire.com/2020/06/18/intel-debuts-cooper-lake-xeons-for-4-8-socket-platforms/, June 2020.

[17]

Intel. Taking a deep dive into Cooper Lake xeon sp processors. https://www.nextplatform.com/2020/06/18/taking-a-deep-dive-into-cooper-lake-xeon-sp-processors/, June 2020.

[18]

Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R Dulloor, et al. Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714, 2019.

[19]

Mark S Johnstone and Paul R Wilson. The memory fragmentation problem: Solved? ACM Sigplan Notices, 34(3):26--36, 1998.

Digital Library

[20]

David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003, pages 137--146. ACM, 2003.

Digital Library

[21]

Kenneth C Knowlton. A fast storage allocator. Communications of the ACM, 8 (10):623--624, 1965.

Digital Library

[22]

Scott P Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A Davis, Matthew Henderson, Yifan Hu, and Read Sandstrom. The suitesparse matrix collection website interface. Journal of Open Source Software, 4(35):1244, 2019.

[23]

Andrew Lenharth, Donald Nguyen, and Keshav Pingali. Parallel graph analytics. Communications of the ACM, 59(5):78--87, 2016.

Digital Library

[24]

Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.

[25]

Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne M. VanBriesen, and Natalie S. Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12--15, 2007, pages 420--429. ACM, 2007.

Digital Library

[26]

Jiawen Liu, Dong Li, Roberto Gioiosa, and Jiajia Li. Athena: High-performance sparse tensor contraction sequence on heterogeneous memory. In Proceedings of the ACM International Conference on Supercomputing, ICS '21, pages 190--202, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383356. URL https://doi.org/10.1145/3447818.3460355.

Digital Library

[27]

Hao Lu, Mahantesh Halappanavar, and Ananth Kalyanaraman. Parallel heuristics for scalable community detection. Parallel Computing, 47:19--37, 2015.

Digital Library

[28]

Andrew Lumsdaine, Douglas P. Gregor, Bruce Hendrickson, and Jonathan W. Berry. Challenges in parallel graph processing. Parallel Process. Lett., 17(1): 5--20, 2007. URL https://doi.org/10.1142/S0129626407002843.

[29]

John D McCalpin. Stream benchmark. Link: www.cs.virginia.edu/stream/ref.html# what, 22, 1995.

[30]

Chris Mellor. Why Micron fears Optane will eat its server DRAM lunch. https://blocksandfiles.com/2019/09/29/optane-pricing-micron-dram-headache/, September 2019.

[31]

MemVerge. Memory Machine. https://memverge.com/wp-content/uploads/2020/10/Data-Sheet_Memory-Machine.pdf, may 2021.

[32]

Marco Minutoli, Mahantesh Halappanavar, Ananth Kalyanaraman, Arun Sathanur, Ryan Mcclure, and Jason McDermott. Fast and scalable implementations of influence maximization algorithms. In 2019 IEEE International Conference on Cluster Computing (CLUSTER), pages 1--12. IEEE, 2019.

[33]

Marco Minutoli, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, and Ananth Kalyanaraman. curipples: influence maximization on multi-gpu systems. In Proceedings of the 34th ACM International Conference on Supercomputing, pages 1--11, 2020.

Digital Library

[34]

Marco Minutoli, Mahantesh Halappanavar, and Ananth Kalyanaraman.pnnl/ripples, 2021. URL https://github.com/pnnl/ripples.

[35]

Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.

[36]

Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 456--471, 2013.

Digital Library

[37]

Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic processes in complex networks. Reviews of modern physics, 87(3):925, 2015.

[38]

Onkar Patil, Latchesar Ionkov, Jason Lee, Frank Mueller, and Michael Lang. Performance characterization of a dram-nvm hybrid memory architecture for hpc applications using intel optane dc persistent memory modules. In Proceedings of the International Symposium on Memory Systems, MEMSYS '19, pages 288--303, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450372060. URL https://doi.org/10.1145/3357526.3357541.

Digital Library

[39]

I. Peng, K. Wu, J. Ren, D. Li, and M. Gokhale. Demystifying the performance of hpc scientific applications on nvm-based memory systems. In 2020 IEEE International Parallel and Distributed Processing Symposium, pages 916--925, 2020.

[40]

Ivy B. Peng, Maya B. Gokhale, and Eric W. Green. System evaluation of the intel optane byte-addressable nvm. In Proceedings of the International Symposium on Memory Systems, MEMSYS '19, pages 304--315, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[41]

David F Richards, Omar Aaziz, Jeanine Cook, Hal Finkel, Brian Homerding, Peter McCorquodale, Tiffany Mintz, Shirley Moore, Abhinacv Bhatele, and Robert Pavel. Fy18 proxy app suite release. milestone report for the ecp proxy app project. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2018.

[42]

Rami Rosen. Resource management: Linux kernel namespaces and cgroups. Haifux, May, 186:70, 2013.

[43]

Steve Scargall. Volatile use of persistent memory. In Programming Persistent Memory, pages 155--186. Springer, 2020.

[44]

Anil Shanbhag, Nesime Tatbul, David Cohen, and Samuel Madden. Large-scale in-memory analytics on intel optanetm dc persistent memory. In Proceedings of the 16th International Workshop on Data Management on New Hardware, DaMoN '20, New York, NY, USA, 2020. Association for Computing Machinery.

Digital Library

[45]

Julian Shun and Guy E Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 135--146, 2013.

Digital Library

[46]

Youze Tang, Xiaokui Xiao, and Yanchen Shi. Influence maximization: near-optimal time complexity meets practical efficiency. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 75--86. ACM, 2014.

Digital Library

[47]

Youze Tang, Yanchen Shi, and Xiaokui Xiao. Influence maximization in near-linear time: A martingale approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 1539--1554. ACM, 2015.

Digital Library

[48]

Alexander van Renen, Lukas Vogel, Viktor Leis, Thomas Neumann, and Alfons Kemper. Persistent memory i/o primitives. In Proceedings of the 15th International Workshop on Data Management on New Hardware, DaMoN'19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450368018. URL https://doi.org/10.1145/3329785.3329930.

Digital Library

[49]

Vish Viswanathan, Karthik Kumar, T Willhalm, P Lu, B Filipiak, and S Sakthivelu. Intel memory latency checker. Intel Corporation, 2013.

[50]

Daniel Waddington, Mark Kunitomi, Clem Dickey, Samyukta Rao, Amir Abboud, and Jantz Tran. Evaluation of intel 3d-xpoint nvdimm technology for memoryintensive genomic workloads. In Proceedings of the International Symposium on Memory Systems, pages 277--287, 2019.

Digital Library

[51]

Michèle Weiland, Holger Brunst, Tiago Quintino, Nick Johnson, Olivier Iffrig, Simon Smart, Christian Herold, Antonino Bonanni, Adrian Jackson, and Mark Parsons. An early evaluation of intel's optane dc persistent memory module and its impact on high-performance scientific applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450362290. URL https://doi.org/10.1145/3295500.3356159.

Digital Library

[52]

Kai Wu, Frank Ober, Shari Hamlin, and Dong Li. Early evaluation of intel optane non-volatile memory with hpc i/o workloads. arXiv preprint arXiv:1708.02199, 2017.

[53]

Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma, and Dong Li. Md-hm: Memoization-based molecular dynamics simulations on big memory system. In Proceedings of the ACM International Conference on Supercomputing, ICS '21, pages 215--226, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383356. URL https://doi.org/10.1145/3447818.3460365.

Digital Library

[54]

Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 169--182, Santa Clara, CA, February 2020. USENIX Association. ISBN 978-1-939133-12-0. URL https://www.usenix.org/conference/fast20/presentation/yang.

Digital Library

[55]

P. Zardoshti, M. Spear, A. Vosoughi, and G. Swart. Understanding and improving persistent transactions on optane dc memory. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 348--357, 2020.

[56]

Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. Graphit: A high-performance dsl for graph analytics. arXiv preprint arXiv:1805.00923, 2018.

Cited By

Zhen HLin ZZhao LZhang JZhou BHuang Q(2024)Application and Challenges of In-Memory Computing in Power System Simulation2024 3rd International Conference on Energy and Electrical Power Systems (ICEEPS)10.1109/ICEEPS62542.2024.10693073(864-868)Online publication date: 14-Jul-2024
https://doi.org/10.1109/ICEEPS62542.2024.10693073
Chen LChen RYang CHan YZhang RZhou XJin PQian W(2023)Workload-Aware Log-Structured Merge Key-Value Store for NVM-SSD Hybrid Storage2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00171(2207-2219)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00171
Jakubowski MSypek P(2022)Electromagnetic Simulations with 3D FEM and Intel Optane Persistent Memory2022 24th International Microwave and Radar Conference (MIKON)10.23919/MIKON54314.2022.9924749(1-5)Online publication date: 12-Sep-2022
https://doi.org/10.23919/MIKON54314.2022.9924749

Index Terms

Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs

Recommendations

Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Large-Scale BSP Graph Processing in Distributed Non-Volatile Memory
GRADES'15: Proceedings of the GRADES'15

Processing large graphs is becoming increasingly important for many domains. Large-scale graph processing requires a large-scale cluster system, which is very expensive. Thus, for high-performance large-scale graph processing in small clusters, we have ...
A dynamic adaptive converter and management for PRAM-based main memory

As DRAM-based main memory becomes a dominant factor in the energy consumption and cost of any computer system, new non-volatile memory technologies have been proposed to replace DRAMs. For example, PRAM is emerged as a leading alternative for main ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2021

1493 pages

ISBN:9781450384421

DOI:10.1145/3458817

General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '21

Sponsor:

SIGHPC

SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 19, 2021

Missouri, St. Louis

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
484
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhen HLin ZZhao LZhang JZhou BHuang Q(2024)Application and Challenges of In-Memory Computing in Power System Simulation2024 3rd International Conference on Energy and Electrical Power Systems (ICEEPS)10.1109/ICEEPS62542.2024.10693073(864-868)Online publication date: 14-Jul-2024
https://doi.org/10.1109/ICEEPS62542.2024.10693073
Chen LChen RYang CHan YZhang RZhou XJin PQian W(2023)Workload-Aware Log-Structured Merge Key-Value Store for NVM-SSD Hybrid Storage2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00171(2207-2219)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00171
Jakubowski MSypek P(2022)Electromagnetic Simulations with 3D FEM and Intel Optane Persistent Memory2022 24th International Microwave and Radar Conference (MIKON)10.23919/MIKON54314.2022.9924749(1-5)Online publication date: 12-Sep-2022
https://doi.org/10.23919/MIKON54314.2022.9924749
Chen XMinutoli MTian JHalappanavar MKalyanaraman ATao DKloeckner AMoreira J(2022)HBMaxProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569647(412-425)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569647

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten