Skip to main content

A Case Study of OpenMP Applied to Map/Reduce-Style Computations

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9342))

Abstract

As data analytics are growing in importance they are also quickly becoming one of the dominant application domains that require parallel processing. This paper investigates the applicability of OpenMP, the dominant shared-memory parallel programming model in high-performance computing, to the domain of data analytics. We contrast the performance and programmability of key data analytics benchmarks against Phoenix++, a state-of-the-art shared memory map/reduce programming system. Our study shows that OpenMP outperforms the Phoenix++ system by a large margin for several benchmarks. In other cases, however, the programming model is lacking support for this application domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Apache Giraph. http://giraph.apache.org/

  2. Apache Hadoop. http://hadoop.apache.org/

  3. Apache Storm. http://storm.apache.org/

  4. Chen, R., Chen, H.: Tiled-mapreduce: efficient and flexible MapReduce processing on multicore with tiling. ACM Trans. Archit. Code Optim. 10(1), 3:1–3:30 (2013). http://doi.acm.org/10.1145/2445572.2445575

    Google Scholar 

  5. Csallner, C., Fegaras, L., Li, C.: New ideas track: testing Mapreduce-style programs. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE 2011, pp. 504–507. ACM, New York (2011). http://doi.acm.org/10.1145/2025113.2025204

  6. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004). http://dl.acm.org/citation.cfm?id=1251254.1251264

  7. Eadline, D.: Redefining scalable OpenMP and MPI price-to-performance with Numascale’s NumaConnect (2014)

    Google Scholar 

  8. Jiang, L., Patel, P.B., Ostrouchov, G., Jamitzky, F.: OpenMP-style parallelism in data-centered multicore computing with R. SIGPLAN Not. 47(8), 335–336 (2012). http://doi.acm.org/10.1145/2370036.2145882

    Article  Google Scholar 

  9. de Kruijf, M., Sankaralingam, K.: MapReduce for the Cell broadband engine architecture. IBM J. Res. Dev. 53(5), 10:1–10:12 (2009)

    Article  Google Scholar 

  10. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). http://dx.doi.org/10.14778/2212351.2212354

    Article  Google Scholar 

  11. Lu, M., Zhang, L., Huynh, H.P., Ong, Z., Liang, Y., He, B., Goh, R., Huynh, R.: Optimizing the MapReduce framework on Intel Xeon Phi coprocessor. In: 2013 IEEE International Conference on Big Data, pp. 125–130, October 2013

    Google Scholar 

  12. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010). http://doi.acm.org/10.1145/1807167.1807184

  13. Mao, Y., Morris, R., Kaashoek, F.: Optimizing MapReduce for multicore architectures. Technical report. MIT-CSAIL-TR-2010-020, MIT Computer Science and Artificial Intelligence Laboratory (2010)

    Google Scholar 

  14. The OpenMP Application Program Interface, version 4.0 edn., July 2013

    Google Scholar 

  15. Rafique, M., Rose, B., Butt, A., Nikolopoulos, D.: CellMR: a framework for supporting MapReduce on asymmetric Cell-based clusters. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS 2009, pp. 1–12, May 2009

    Google Scholar 

  16. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, HPCA 2007, pp. 13–24. IEEE Computer Society, Washington, DC (2007). http://dx.doi.org/10.1109/HPCA.2007.346181

  17. Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, MapReduce 2011, pp. 9–16. ACM, New York (2011). http://doi.acm.org/10.1145/1996092.1996095

  18. Wottrich, R., Azevedo, R., Araujo, G.: Cloud-based OpenMP parallelization using a MapReduce runtime. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 334–341, October 2014

    Google Scholar 

  19. Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., Zhou, L.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pp. 44–53. ACM, New York (2014). http://doi.acm.org/10.1145/2591062.2591177

  20. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 2:1–2:6. ACM, New York (2013). http://doi.acm.org/10.1145/2484425.2484427

  21. Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC 2009, pp. 198–207. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/IISWC.2009.5306783

  22. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113

Download references

Acknowledgment

This work is supported by the European Community’s Seventh Framework Programme (FP7/2007–2013) under the ASAP project, grant agreement no. 619706, and by the United Kingdom EPSRC under grant agreement EP/L027402/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahwish Arif .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Arif, M., Vandierendonck, H. (2015). A Case Study of OpenMP Applied to Map/Reduce-Style Computations. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24595-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24594-2

  • Online ISBN: 978-3-319-24595-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics