Abstract
MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
In this paper, we study the feasibility of running typical supercomputing applications using the MapReduce framework. We port two applications (Water Spatial and Radix Sort) from the Stanford SPLASH-2 suite to MapReduce. By completely evaluating them in Hadoop, an open-source MapReduce framework for clusters, we analyze the major performance bottleneck of them in the MapReduce framework. Based on this, we also provide several suggestions in enhancing the MapReduce framework to suite these applications.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Bialecki, A., Cafarella, M., Cutting, D., O’Malley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware (2005), http://lucene.apache.org/hadoop
Dyer, C., Cordova, A., Mont, A., Lin, J.: Fast, easy, and cheap: Construction of statistical machine translation models with MapReduce. In: Proceedings of the Third Workshop on Statistical Machine Translation at ACL, pp. 199–207 (2008)
Elsayed, T., Lin, J., Oard, D.W.: Pairwise document similarity in large collections with mapreduce. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pp. 265–268 (2008)
Wolfe, J., Haghighi, A., Klein, D.: Fully distributed EM for very large datasets. In: Proceedings of the 25th international conference on Machine learning, pp. 1184–1191. ACM, New York (2008)
Bryant, R.: Data-intensive supercomputing: The case for DISC (2007)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proc. ISCA (1995)
Singh, J.P., Gupta, A., Levoy, M.: SPLASH: Stanford parallel applications for shared memory. Computer Architecture News 20(1), 5–44 (1994)
Lie, G., Clementi, E.: Moleculear-dynamics simulation of liquid water with an ab initio flexible water-water interaction potential. Physical Review A33, 2679–2693 (1986)
Matsuoka, O., Clementi, E., Yoshimine, M.: CI study of the water dimer potential suface. Journal of Chemical Physics 64(4), 1351–1361 (1976)
Barlett, R., Shavitt, I., Purvis, G.: The quartic force field of H 2 O determined by many-body methods that include quadruple excitation effects. Journal of Chemical Physics 71(1), 281–291 (1979)
Blelloch, G.E., Leiserson, C.E., Maggs, B.M., Plaxton, C.G., Smith, S.J., Zagha, M.: A comparison of sorting algorithm for the connection machine CM-2. In: Proc. SPAA (1991)
Yang, H., Dasdan, A., Hsiao, R., Parker, D.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proc. SIGMOD (2007)
Chu, C., Kim, S., Lin, Y., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference, p. 281. MIT Press, Cambridge (2007)
Ekanayake, J., Pallickara, S., Fox, G.: MapReduce for Data Intensive Scientific Analyses. In: IEEE Fourth International Conference on eScience, 2008. eScience 2008, pp. 277–284 (2008)
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: Proc. HPCA (2007)
He, B., Fang, W., Luo, Q., Govindaraju, N., Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proc. PACT (2008)
de Kruijf, M., Sankaralingam, K.: MapReduce for the Cell BE Architecture. University of Wisconsin Computer Sciences Technical Report CS-TR-2007
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, S., Xiao, Z., Chen, H., Chen, R., Zhang, W., Zang, B. (2009). Evaluating SPLASH-2 Applications Using MapReduce. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-03644-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)