Skip to main content

PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11334))

  • 1585 Accesses

Abstract

Parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 brings severe challenge to efficient communication when computing parallel BFS in large-scale graphs. In this paper, we propose an effective method PruX for optimizing the communication of parallel BFS in two aspects. First, we adopt a scalable structure to record the access information of the vertices on each machine. Second, we prune unnecessary inter-machine communication for previously accessed vertices by checking the records. Evaluation results show that the performance of our method is at least six times higher than that of the original implementation of parallel BFS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The PruX and direction optimization are all optimized by modifying the algorithm execution mode to implement the parallel BFS algorithm. So we choose direction optimization as a contrast.

  2. 2.

    We only implement the direction optimization at the algorithm level, and do not optimize its storage and computation, which results in breakdown when SCALE is too large.

  3. 3.

    Because there are still a lot of isolated vertices in the graph, the direction optimization will compute these vertices in the bottom-up BFS algorithm, which will bring a lot of computation cost and lead to the performance degradation of large-scale graphs.

References

  1. Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)

    Google Scholar 

  2. Ajwani, D., Meyer, U., Osipov, V.:. Improved external memory BFS implementation. In: The Workshop on Algorithm Engineering & Experiments (2007)

    Google Scholar 

  3. Akkary, H., Driscoll, M.A.: A dynamic multithreading processor. In: 1998 Proceedings of ACM/IEEE International Symposium on Microarchitecture, Micro-31, pp. 226–236 (1998)

    Google Scholar 

  4. Awerbuch, B., Gallager, R.: A new distributed algorithm to find breadth first search trees. IEEE Trans. Inf. Theory 33(3), 315–322 (2003)

    Article  Google Scholar 

  5. Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2, vol. 34, no. 2, pp. 523–530 (2006)

    Google Scholar 

  6. Beamer, S., Patterson, D.: Direction-optimizing breadth-first search. In: International Conference on High Performance Computing, Networking, Storage and Analysis, p. 12 (2012)

    Google Scholar 

  7. Bidstrup, S.M., Grady, C.P.L.: SSSP: simulation of single-sludge processes. Journal 60(3), 351–361 (1988)

    Google Scholar 

  8. Bulu, A.: Parallel breadth-first search on distributed memory systems, pp. 1–12 (2011)

    Google Scholar 

  9. Checconi, F., Petrini, F.: Traversing trillions of edges in real time: graph exploration on large-scale parallel machines. In: IEEE International Parallel and Distributed Processing Symposium, pp. 425–434 (2014)

    Google Scholar 

  10. Chow, E., Henderson, K., Yoo, A.: Distributed breadth-first search with 2-D partitioning. Lawrence Livermore National Laboratory (2005)

    Google Scholar 

  11. Dongarra, J., et al.: Special issue - MPI - a message passing interface standard. Int. J. Supercomput. Appl. High Perform. Comput. 8, 165 (1994)

    Google Scholar 

  12. Duran, A., Klemm, M.: The Intel® many integrated core architecture. In: International Conference on High Performance Computing and Simulation, pp. 365–366 (2012)

    Google Scholar 

  13. Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: High Performance Computing, Networking, Storage, pp. 769–780 (2015)

    Google Scholar 

  14. Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+ OpenSHMEM programming models (2013)

    Google Scholar 

  15. Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA 2010: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June, pp. 303–314 (2010)

    Google Scholar 

  16. Lu, H., Tan, G., Chen, M., Sun, N.: Reducing communication in parallel breadth-first search on distributed memory systems, pp. 1261–1268 (2015)

    Google Scholar 

  17. Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)

    Article  MathSciNet  Google Scholar 

  18. Luo, L., Wong, M., Hwu, W.M.: An effective GPU implementation of breadth-first search. In: Design Automation Conference, pp. 52–55 (2010)

    Google Scholar 

  19. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)

    Google Scholar 

  20. Sallinen, S., Gharaibeh, A., Ripeanu, M.: Accelerating direction-optimized breadth first search on hybrid architectures. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 233–245. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27308-2_20

    Chapter  Google Scholar 

  21. Snir, M.: MPI : The Complete Reference, pp. 4038–4040 (2010)

    Google Scholar 

  22. Su, B.Y., Brutch, T.G., Keutzer, K.: Parallel BFS graph traversal on images using structured grid, pp. 4489–4492 (2010)

    Google Scholar 

  23. Yoo, A., Chow, E., Henderson, K., Mclendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p. 25 (2005)

    Google Scholar 

Download references

Acknowledgment

This work is sponsored in part by the National Basic Research Program of China (793) under Grant No. 2014CB340303 and by National Natural Science Foundation of China (NSFC) under Grant No. 61772541.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songzhu Mei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jia, M., Zhang, Y., Li, D., Mei, S. (2018). PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05051-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05050-4

  • Online ISBN: 978-3-030-05051-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics