Skip to main content

Performance Analysis of Spark/GraphX on POWER8 Cluster

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

Abstract

POWER 8, the latest RISC (Reduced Instruction Set Computer) microprocessor of the IBM Power architecture family, was designed to significantly benefit emerging workloads, including Business Analytics, Cloud Computing and High Performance Computing. In this paper, we provide a thorough performance evaluation on a widely used large-scale graph processing framework, Spark/GraphX, on a POWER 8 cluster. Note that we use Spark and Java versions out of the box without any optimization. We examine the performance with several important graph kernels such as Breadth-First Search, Connected Components, and PageRank using both large real-world social graphs and synthetic graphs of billions of edges. We study the Spark/GraphX performance against some architectural aspects and perform the first Spark/GraphX scalability test with up to 16 POWER 8 nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program.

  2. 2.

    A system monitor command used to report on various system loads.

References

  1. Hadoop MapReduce. https://hadoop.apache.org/

  2. OpenPOWER. http://openpowerfoundation.org/

  3. Big Data and Analytics on IBM Power Systems (2015). https://www.ibm.com/developerworks/community/blogs/f0f3cd83-63c2-4744-9021-9ff31e7004a9/entry/Apache_Spark_Runs_2X_Faster_on_IBM_s_POWER8?lang=en

  4. POWER8 - the first OpenPOWER processor (2015). http://openpowerfoundation.org/blogs/power8-the-first-openpower-processor/

  5. Spark configuration (2016). http://spark.apache.org/docs/latest/configuration.html

  6. Spark programming guid (2016). http://spark.apache.org/docs/latest/programming-guide.html

  7. Abu-Doleh, A., Catalyurek, U.V.: Spaler: Spark And GraphX based de novo genome assembler. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1013–1018. IEEE (2015)

    Google Scholar 

  8. Brock, B., Liu, F., Rajamani, K.: Stac-a2™ benchmark on POWER8. In: Proceedings of the 8th Workshop on High Performance Computational Finance, WHPCF 2015, p. 1:1–1:8. ACM, New York (2015)

    Google Scholar 

  9. Buono, D., Petrini, F., Checconi, F., Liu, X., Que, X., Long, C., Tuan, T.C.: Optimizing sparse matrix-vector multiplication for large-scale data analytics. In: Proceedings of the 30th ACM on International Conference on Supercomputing, ICS 2016. ACM (2016, to appear)

    Google Scholar 

  10. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the 4th ACM on International Conference on Data Mining (SDM 2004), Lake Buena Vista, pp. 442–446, April 2004

    Google Scholar 

  11. Ewart, T., Yates, S., Cremonesi, F., Kumbhar, P., Schürmann, F., Delalondre, F.: Performance evaluation of the IBM POWER8 architecture to support computational neuroscientific application using morphologically detailed neurons. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS 2015, p. 1:1–1:11. ACM, New York (2015)

    Google Scholar 

  12. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, pp. 17–30. USENIX Association, Berkeley (2012)

    Google Scholar 

  13. Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI 2014, pp. 599–613. USENIX Association, Berkeley (2014)

    Google Scholar 

  14. Heintz, B., Chandra, A.: Enabling scalable social group analytics via hypergraph analysis systems. In: 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2015). USENIX Association, Santa Clara, July 2015

    Google Scholar 

  15. Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 229–238. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  16. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010)

    Google Scholar 

  17. Langewisch, R.: A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX (2015)

    Google Scholar 

  18. Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)

    MathSciNet  MATH  Google Scholar 

  19. Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF 2015, pp. 53:1–53:8. ACM, New York (2015)

    Google Scholar 

  20. Lim, S., Lee, S., Ganesh, G., Brown, T.C., Sukumar, S.R.: Graph processing platforms at scale: practices and experiences. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, 29–31 March 2015, Philadelphia, PA, USA, pp. 42–51 (2015)

    Google Scholar 

  21. Liu, X., Buono, D., Checconi, F., Choi, J.W., Que, X., Petrini, F., Gunnels, J., Stuecheli, J.: An early performance study of large-scale POWER8 SMP systems. In: Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium. IPDPS 2015, IEEE Computer Society, Washington, DC (2016)

    Google Scholar 

  22. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010)

    Google Scholar 

  23. Mushtaq, H., Al-Ars, Z.: Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1471–1477. IEEE Computer Society (2015)

    Google Scholar 

  24. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report 1999–66, Stanford InfoLab, previous number=SIDL-WP-1999-0120, November 1999

    Google Scholar 

  25. Que, X., Checconi, F., Petrini, F., Liu, X., Buono, D.: Exploring network optimizations for large-scale graph analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 26:1–26:10. ACM, New York (2015)

    Google Scholar 

  26. Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, pp. 472–488. ACM, New York (2013)

    Google Scholar 

  27. Salihoglu, S., Widom, J.: GPS: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM 2013, pp. 22:1–22:12. ACM, New York (2013)

    Google Scholar 

  28. Seshadhri, C., Pinar, A., Kolda, T.G.: An in-depth study of stochastic Kronecker graphs. In: International Conference on Data Mining, pp. 587–596. IEEE Computer Society, Los Alamitos (2011)

    Google Scholar 

  29. Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. SIGPLAN Not. 48(8), 135–146 (2013)

    Article  Google Scholar 

  30. Sinharoy, B., Norstrand, J.A.V., Eickemeyer, R.J., Le, H.Q., Leenstra, J., Nguyen, D.Q., Konigsburg, B., Ward, K., Brown, M.D., Moreira, J.E., Levitan, D., Tung, S., Hrusecky, D., Bishop, J.W., Gschwind, M., Boersma, M., Kroener, M., Kaltenbach, M., Karkhanis, T., Fernsler, K.M.: IBM POWER8 processor core microarchitecture. IBM J. Res. Dev. 59(1), 2:1–2:21 (2015)

    Article  Google Scholar 

  31. Sud, A., Andersen, E., Curtis, S., Lin, M.C., Manocha, D.: Real-time path planning for virtual agents in dynamic environments. In: IEEE Virtual Reality, Charlotte, NC, March 2007

    Google Scholar 

  32. Wu, M., Yang, F., Xue, J., Xiao, W., Miao, Y., Wei, L., Lin, H., Dai, Y., Zhou, L.: GraM: scaling graph computation to the trillions. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC 2015, pp. 408–421. ACM, New York (2015)

    Google Scholar 

  33. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 2:1–2:6. ACM, New York (2013)

    Google Scholar 

  34. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth (2012). CoRR

    Google Scholar 

  35. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010)

    Google Scholar 

  36. Zhang, L., Kim, Y.J., Manocha, D.: A simple path non-existence algorithm using C-obstacle query. In: Proceedings of the International Workshop on the Algorithmic Foundations of Robotics (WAFR 2006), New York City, July 2006

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xinyu Que , Lars Schneidenbach or Fabio Checconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Que, X., Schneidenbach, L., Checconi, F., Costa, C.H.Ã., Buono, D. (2016). Performance Analysis of Spark/GraphX on POWER8 Cluster. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics