A Case Study of OpenMP Applied to Map/Reduce-Style Computations

Arif, Mahwish; Vandierendonck, Hans

doi:10.1007/978-3-319-24595-9_12

A Case Study of OpenMP Applied to Map/Reduce-Style Computations

Mahwish Arif¹⁸ &
Hans Vandierendonck¹⁸

Conference paper
First Online: 26 November 2015

788 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9342))

Abstract

As data analytics are growing in importance they are also quickly becoming one of the dominant application domains that require parallel processing. This paper investigates the applicability of OpenMP, the dominant shared-memory parallel programming model in high-performance computing, to the domain of data analytics. We contrast the performance and programmability of key data analytics benchmarks against Phoenix++, a state-of-the-art shared memory map/reduce programming system. Our study shows that OpenMP outperforms the Phoenix++ system by a large margin for several benchmarks. In other cases, however, the programming model is lacking support for this application domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Apache Giraph. http://giraph.apache.org/
Apache Hadoop. http://hadoop.apache.org/
Apache Storm. http://storm.apache.org/
Chen, R., Chen, H.: Tiled-mapreduce: efficient and flexible MapReduce processing on multicore with tiling. ACM Trans. Archit. Code Optim. 10(1), 3:1–3:30 (2013). http://doi.acm.org/10.1145/2445572.2445575
Google Scholar
Csallner, C., Fegaras, L., Li, C.: New ideas track: testing Mapreduce-style programs. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE 2011, pp. 504–507. ACM, New York (2011). http://doi.acm.org/10.1145/2025113.2025204
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004). http://dl.acm.org/citation.cfm?id=1251254.1251264
Eadline, D.: Redefining scalable OpenMP and MPI price-to-performance with Numascale’s NumaConnect (2014)
Google Scholar
Jiang, L., Patel, P.B., Ostrouchov, G., Jamitzky, F.: OpenMP-style parallelism in data-centered multicore computing with R. SIGPLAN Not. 47(8), 335–336 (2012). http://doi.acm.org/10.1145/2370036.2145882
Article Google Scholar
de Kruijf, M., Sankaralingam, K.: MapReduce for the Cell broadband engine architecture. IBM J. Res. Dev. 53(5), 10:1–10:12 (2009)
Article Google Scholar
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). http://dx.doi.org/10.14778/2212351.2212354
Article Google Scholar
Lu, M., Zhang, L., Huynh, H.P., Ong, Z., Liang, Y., He, B., Goh, R., Huynh, R.: Optimizing the MapReduce framework on Intel Xeon Phi coprocessor. In: 2013 IEEE International Conference on Big Data, pp. 125–130, October 2013
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010). http://doi.acm.org/10.1145/1807167.1807184
Mao, Y., Morris, R., Kaashoek, F.: Optimizing MapReduce for multicore architectures. Technical report. MIT-CSAIL-TR-2010-020, MIT Computer Science and Artificial Intelligence Laboratory (2010)
Google Scholar
The OpenMP Application Program Interface, version 4.0 edn., July 2013
Google Scholar
Rafique, M., Rose, B., Butt, A., Nikolopoulos, D.: CellMR: a framework for supporting MapReduce on asymmetric Cell-based clusters. In: IEEE International Symposium on Parallel Distributed Processing, IPDPS 2009, pp. 1–12, May 2009
Google Scholar
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, HPCA 2007, pp. 13–24. IEEE Computer Society, Washington, DC (2007). http://dx.doi.org/10.1109/HPCA.2007.346181
Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, MapReduce 2011, pp. 9–16. ACM, New York (2011). http://doi.acm.org/10.1145/1996092.1996095
Wottrich, R., Azevedo, R., Araujo, G.: Cloud-based OpenMP parallelization using a MapReduce runtime. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 334–341, October 2014
Google Scholar
Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., Zhou, L.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pp. 44–53. ACM, New York (2014). http://doi.acm.org/10.1145/2591062.2591177
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 2:1–2:6. ACM, New York (2013). http://doi.acm.org/10.1145/2484425.2484427
Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC 2009, pp. 198–207. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/IISWC.2009.5306783
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010). http://dl.acm.org/citation.cfm?id=1863103.1863113

Download references

Acknowledgment

This work is supported by the European Community’s Seventh Framework Programme (FP7/2007–2013) under the ASAP project, grant agreement no. 619706, and by the United Kingdom EPSRC under grant agreement EP/L027402/1.

Author information

Authors and Affiliations

Queen’s University Belfast, Belfast, UK
Mahwish Arif & Hans Vandierendonck

Authors

Mahwish Arif
View author publications
You can also search for this author in PubMed Google Scholar
Hans Vandierendonck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahwish Arif .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Christian Terboven
Lawrence Livermore National Laboratory, Livermore, California, USA
Bronis R. de Supinski
RWTH Aachen University, Aachen, Germany
Pablo Reble
University of Houston, Houston, California, USA
Barbara M. Chapman
RWTH Aachen University, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arif, M., Vandierendonck, H. (2015). A Case Study of OpenMP Applied to Map/Reduce-Style Computations. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-24595-9_12
Published: 26 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24594-2
Online ISBN: 978-3-319-24595-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics