PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark

Jia, Menghan; Zhang, Yiming; Li, Dongsheng; Mei, Songzhu

doi:10.1007/978-3-030-05051-1_8

Menghan Jia¹⁶,
Yiming Zhang¹⁶,
Dongsheng Li¹⁶ &
…
Songzhu Mei¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11334))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1585 Accesses

Abstract

Parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 brings severe challenge to efficient communication when computing parallel BFS in large-scale graphs. In this paper, we propose an effective method PruX for optimizing the communication of parallel BFS in two aspects. First, we adopt a scalable structure to record the access information of the vertices on each machine. Second, we prune unnecessary inter-machine communication for previously accessed vertices by checking the records. Evaluation results show that the performance of our method is at least six times higher than that of the original implementation of parallel BFS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scalable and efficient graph traversal on high-throughput cluster

Article 10 November 2020

Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark

XSP: Fast SSSP Based on Communication-Computation Collaboration

Notes

1.
The PruX and direction optimization are all optimized by modifying the algorithm execution mode to implement the parallel BFS algorithm. So we choose direction optimization as a contrast.
2.
We only implement the direction optimization at the algorithm level, and do not optimize its storage and computation, which results in breakdown when SCALE is too large.
3.
Because there are still a lot of isolated vertices in the graph, the direction optimization will compute these vertices in the bottom-up BFS algorithm, which will bring a lot of computation cost and lead to the performance degradation of large-scale graphs.

References

Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)
Google Scholar
Ajwani, D., Meyer, U., Osipov, V.:. Improved external memory BFS implementation. In: The Workshop on Algorithm Engineering & Experiments (2007)
Google Scholar
Akkary, H., Driscoll, M.A.: A dynamic multithreading processor. In: 1998 Proceedings of ACM/IEEE International Symposium on Microarchitecture, Micro-31, pp. 226–236 (1998)
Google Scholar
Awerbuch, B., Gallager, R.: A new distributed algorithm to find breadth first search trees. IEEE Trans. Inf. Theory 33(3), 315–322 (2003)
Article Google Scholar
Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2, vol. 34, no. 2, pp. 523–530 (2006)
Google Scholar
Beamer, S., Patterson, D.: Direction-optimizing breadth-first search. In: International Conference on High Performance Computing, Networking, Storage and Analysis, p. 12 (2012)
Google Scholar
Bidstrup, S.M., Grady, C.P.L.: SSSP: simulation of single-sludge processes. Journal 60(3), 351–361 (1988)
Google Scholar
Bulu, A.: Parallel breadth-first search on distributed memory systems, pp. 1–12 (2011)
Google Scholar
Checconi, F., Petrini, F.: Traversing trillions of edges in real time: graph exploration on large-scale parallel machines. In: IEEE International Parallel and Distributed Processing Symposium, pp. 425–434 (2014)
Google Scholar
Chow, E., Henderson, K., Yoo, A.: Distributed breadth-first search with 2-D partitioning. Lawrence Livermore National Laboratory (2005)
Google Scholar
Dongarra, J., et al.: Special issue - MPI - a message passing interface standard. Int. J. Supercomput. Appl. High Perform. Comput. 8, 165 (1994)
Google Scholar
Duran, A., Klemm, M.: The Intel® many integrated core architecture. In: International Conference on High Performance Computing and Simulation, pp. 365–366 (2012)
Google Scholar
Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: High Performance Computing, Networking, Storage, pp. 769–780 (2015)
Google Scholar
Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+ OpenSHMEM programming models (2013)
Google Scholar
Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA 2010: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June, pp. 303–314 (2010)
Google Scholar
Lu, H., Tan, G., Chen, M., Sun, N.: Reducing communication in parallel breadth-first search on distributed memory systems, pp. 1261–1268 (2015)
Google Scholar
Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)
Article MathSciNet Google Scholar
Luo, L., Wong, M., Hwu, W.M.: An effective GPU implementation of breadth-first search. In: Design Automation Conference, pp. 52–55 (2010)
Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
Google Scholar
Sallinen, S., Gharaibeh, A., Ripeanu, M.: Accelerating direction-optimized breadth first search on hybrid architectures. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 233–245. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27308-2_20
Chapter Google Scholar
Snir, M.: MPI : The Complete Reference, pp. 4038–4040 (2010)
Google Scholar
Su, B.Y., Brutch, T.G., Keutzer, K.: Parallel BFS graph traversal on images using structured grid, pp. 4489–4492 (2010)
Google Scholar
Yoo, A., Chow, E., Henderson, K., Mclendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p. 25 (2005)
Google Scholar

Download references

Acknowledgment

This work is sponsored in part by the National Basic Research Program of China (793) under Grant No. 2014CB340303 and by National Natural Science Foundation of China (NSFC) under Grant No. 61772541.

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Menghan Jia, Yiming Zhang, Dongsheng Li & Songzhu Mei

Authors

Menghan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Songzhu Mei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Songzhu Mei .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, M., Zhang, Y., Li, D., Mei, S. (2018). PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-05051-1_8
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05050-4
Online ISBN: 978-3-030-05051-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Scalable and efficient graph traversal on high-throughput cluster

Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark

XSP: Fast SSSP Based on Communication-Computation Collaboration

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Scalable and efficient graph traversal on high-throughput cluster

Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark

XSP: Fast SSSP Based on Communication-Computation Collaboration

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation