Abstract
Two major trends in computing systems are the growth in high performance computing (HPC) with in particular an international exascale initiative, and big data with an accompanying cloud infrastructure of dramatic and increasing size and sophistication. In this paper, we study an approach to convergence for software and applications/algorithms and show what hardware architectures it suggests. We start by dividing applications into data plus model components and classifying each component (whether from Big Data or Big Compute) in the same way. This leads to 64 properties divided into 4 views, which are Problem Architecture (Macro pattern); Execution Features (Micro patterns); Data Source and Style; and finally the Processing (runtime) View. We discuss convergence software built around HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack) and show how one can merge Big Data and HPC (Big Simulation) concepts into a single stack and discuss appropriate hardware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Big Data and Extreme-scale Computing (BDEC). http://www.exascale.org/bdec/. Accessed 29 Jan 2016
Data Science Curriculum: Indiana University Online Class: Big Data Open Source Software and Projects (2014). http://bigdataopensourceprojects.soic.indiana.edu/. Accessed 11 Dec 2014
DDDAS Dynamic Data-Driven Applications System Showcase. http://www.1dddas.org/. Accessed 22 July 2015
HPC-ABDS Kaleidoscope of over 350 Apache Big Data Stack and HPC Technologies. http://hpc-abds.org/kaleidoscope/
NSCI: Executive Order - creating a National Strategic Computing Initiative, 29 July 2015. https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative
NIST Big Data Use Case & Requirements. V1.0 Final Version 2015, January 2016. http://bigdatawg.nist.gov/V1_output_docs.php
Apache Software Foundation: Apache Flink open source platform for distributed stream and batch data processing. https://flink.apache.org/. Accessed 16 Jan 2016
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Tech. rep., UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006). http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991)
Baru, C., Rabl, T.: Tutorial 4 “Big Data Benchmarking” at 2014 IEEE International Conference on Big Data (2014). http://cci.drexel.edu/bigdata/bigdata2014/tutorial.htm Accessed 2 Jan 2015
Baru, C.: BigData Top 100 List. http://www.bigdatatop.100.org/. Accessed Jan 2016
Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, 10 May 2007. http://www.cs.cmu.edu/bryant/pubdir/cmu-cs-07-128.pdf
Bryant, R.E.: Supercomputing & Big Data: A Convergence. https://www.nitrd.gov/nitrdgroups/images/5/5e/SC15panel_RandalBryant.pdf. Supercomputing (SC) 15 Panel- Supercomputing and Big Data: From Collision to Convergence Nov 18 2015 - Austin, Texas. https://www.nitrd.gov/apps/hecportal/index.php?title=Events#Supercomputing_.28SC.29_15_Panel
Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1337–1345 (2013)
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM (2010)
Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience (eScience 2008), pp. 277–284. IEEE (2008)
Ekanayake, S., Kamburugamuve, S., Fox, G.: SPIDAL: high performance data analytics with Java and MPI on large multicore HPC clusters, Technical report, January 2016. http://dsc.soic.indiana.edu/publications/hpc2016-spidal-high-performance-submit-18-public.pdf
Fox, G., Jha, S., Qiu, J., Ekanazake, S., Luckow, A.: Towards a comprehensive set of big data benchmarks. In: Big Data and High Performance Computing, vol. 26, p. 47, February 2015. http://grids.ucs.indiana.edu/ptliupages/publications/OgreFacetsv9.pdf
Fox, G., Chang, W.: Big data use cases and requirements. In: 1st Big Data Interoperability Framework Workshop: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data, pp. 18–21 (2014)
Fox, G., Qiu, J., Jha, S.: High performance high functionality big data software stack. In: Big Data and Extreme-scale Computing (BDEC) (2014). http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdf
Fox, G.C., Jha, S., Qiu, J., Luckow, A.: Towards an understanding of facets and exemplars of big data applications. In: 20 Years of Beowulf: Workshop to Honor Thomas Sterling’s 65th Birthday October, Annapolis 14 October 2014. http://dx.doi.org/10.1145/2737909.2737912
Fox, G.C., Jha, S., Qiu, J., Luckow, A.: Ogres: a systematic approach to big data benchmarks. In: Big Data and Extreme-scale, Computing (BDEC), pp. 29–30 (2015)
Fox, G.C., Qiu, J., Kamburugamuve, S., Jha, S., Luckow, A.: HPC-ABDS high performance computing enhanced apache big data stack. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 1057–1066. IEEE (2015)
Iandola, F.N., Ashraf, K., Moskewicz, M.W., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. arXiv preprint arxiv:1511.00175 (2015)
Jha, S., Qiu, J., Luckow, A., Mantha, P., Fox, G.C.: A tale of two data-intensive paradigms: applications, abstractions, and architectures. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 645–652. IEEE (2014)
Kamburugamuve, S., Ekanayake, S., Pathirage, M., Fox, G.: Towards high performance processing of streaming data in large data centers, Technical report (2016). http://dsc.soic.indiana.edu/publications/high_performance_processing_stream.pdf
National Research Council: Frontiers in Massive Data Analysis. The National Academies Press, Washington (2013)
Qiu, J., Jha, S., Luckow, A., Fox, G.C.: Towards HPC-ABDS: an initial high-performance big data stack. In: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data, pp. 18–21 (2014). http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdf
Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015)
Trader, T.: Toward a converged exascale-big data software stack, 28 January 2016. http://www.hpcwire.com/2016/01/28/toward-a-converged-software/-stack-for-extreme-scale-computing-and-big-data/
Van der Wijngaart, R.F., Sridharan, S., Lee, V.W.: Extending the BT NAS parallel benchmark to exascale computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 94. IEEE Computer Society Press (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, vol. 10, p. 10 (2010)
Zhang, B., Peng, B., Qiu, J.: Parallel LDA through synchronized communication optimizations. Technical report (2015). http://dsc.soic.indiana.edu/publications/LDA_optimization_paper.pdf
Zhang, B., Ruan, Y., Qiu, J.: Harp: collective communication on hadoop. In: IEEE International Conference on Cloud Engineering (IC2E) Conference (2014)
Acknowledgments
This work was partially supported by NSF CIF21 DIBBS 1443054, NSF OCI 1149432 CAREER. and AFOSR FA9550-13-1-0225 awards. We thank Dennis Gannon for comments on an early draft.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Fox, G., Qiu, J., Jha, S., Ekanayake, S., Kamburugamuve, S. (2016). Big Data, Simulations and HPC Convergence. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-49748-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49747-1
Online ISBN: 978-3-319-49748-8
eBook Packages: Computer ScienceComputer Science (R0)