Skip to main content
Log in

Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The dawn of exascale computing and its convergence with big data analytics has greatly spurred research interests. The reasons are straightforward. Traditionally, high performance computing (HPC) systems have been used for scientific applications involving majority of compute-intensive tasks. At the same time, the proliferation of big data resulted into design of data-intensive processing paradigms like Apache big data stack. Big data generating at high pace necessitates faster processing mechanisms for getting insights at a real time. For this, the HPC systems may serve as panacea for solving the big data problems. Though the HPC systems have the capability to give the promising results for big data, directly integrating them with existing data-intensive frameworks like Apache big data stack is not straightforward due to challenges associated with them. This triggers a research on seamlessly integrating these two paradigms based on interoperable framework, programming model, and system architecture. The aim of this paper is to assess a progress made in HPC world as an effort to augment it with big data analytics support. As an outcome of this, the taxonomy showing the factors to be considered for augmenting HPC systems with big data support has been put forth. This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem. The focus is given on research issues related to augmenting HPC paradigms with big data frameworks and corresponding approaches to address those issues. This paper also discusses data-intensive as well as compute-intensive processing paradigms, benchmark suites and workloads, and future directions in the domain of integrating HPC with big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51 (2008)

  2. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Newton (2012)

    Google Scholar 

  3. Apache Spark. https://spark.apache.org. Accessed 22 Sep 2018

  4. Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58, 56–68 (2015)

    Google Scholar 

  5. Elsebakhi, E., et al.: Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms. J. Comput. Sci. 11, 69–81 (2015)

    MathSciNet  Google Scholar 

  6. Bianchini, G., Caymes-Scutari, P., Méndez-Garabetti, M.: Evolutionary-Statistical System: a parallel method for improving forest fire spread prediction. J. Comput. Sci. 6, 58–66 (2015)

    Google Scholar 

  7. Zhao, G., Bryan, B.A., King, D., Song, X., Yu, Q.: Parallelization and optimization of spatial analysis for large scale environmental model data assembly. Comput. Electron. Agric. 89, 94–99 (2012)

    Google Scholar 

  8. Bhangale, U.M., Kurte, K.R., Durbha, S.S., King, R.L., Younan, N.H.: Big data processing using HPC for remote sensing disaster data. In: Geoscience and Remote Sensing Symposium (IGARSS), 2016, pp. 5894–5897. IEEE International (2016)

  9. Worldwide high-performance data analysis forecast. https://www.marketresearchfuture.com/reports/high-performance-data-analytics-hpda-market-1828

  10. Cray Urika-XC. http://www.cray.com/products/analytics/urika-xc. Accessed 27 Sep 2018

  11. Wrangler. https://portal.tacc.utexas.edu/-/introduction-to-wrangler. Accessed 27 Sep 2018

  12. HPCC. https://hpccsystems.com. Accessed 30 Sep 2018

  13. Bridges. https://www.psc.edu/bridges. Accessed 30 Sep 2018

  14. ADIOS. https://www.exascaleproject.org/project/adios-framework-scientific-data-exascale-systems/. Accessed 7 Feb 2019

  15. CODAR. https://www.exascaleproject.org/project/codar-co-design-center-online-data-analysis-reduction-exascale/. Accessed 7 Feb 2019

  16. EXAFEL. https://www.exascaleproject.org/project/exafel-data-analytics-exascale-free-electron-lasers/. Accessed 7 Feb 2019

  17. ExaLearn Co-Design Center. https://www.exascaleproject.org/ecp-announces-new-co-design-center-to-focus-on-exascale-machine-learning-technologies/. Accessed 7 Feb 2019

  18. Park, B.H., Hukerikar, S., Adamson, R., Engelmann, C.: Big data meets HPC log analytics: scalable approach to understanding systems at extreme scale. In: IEEE International Conference on Cluster Computing (CLUSTER), 2017, pp. 758–765 (2017)

  19. Moise, D.: Experiences with performing MapReduce analysis of scientific data on HPC platforms. In: Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, pp. 11–18 (2016)

  20. Fox, G.C., Qiu, J., Kamburugamuve, S., Jha, S., Luckow, A.: HPC-ABDS high performance computing enhanced Apache big data stack. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 1057–1066 (2015)

  21. Fox, G., Qiu, J., Jha, S., Ekanayake, S., Kamburugamuve, S.: Big data, simulations and HPC convergence. In: Big Data Benchmarking, pp. 3–17. Springer (2015)

  22. Veiga, J., Expósito, R.R., Taboada, G.L., Touriño, J.: Analysis and evaluation of MapReduce solutions on an HPC cluster. Comput. Electr. Eng. 50, 200–216 (2016)

    Google Scholar 

  23. Xenopoulos, P., Daniel, J., Matheson, M., Sukumar, S.: Big data analytics on HPC architectures: performance and cost. In 2016 IEEE International Conference on Big Data (Big Data), pp. 2286–2295 (2016)

  24. Asaadi, H., Khaldi, D., Chapman, B.: A comparative survey of the HPC and big data paradigms: analysis and experiments. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 423–432 (2016)

  25. Wasi-ur-Rahman, M., Islam, N.S., Lu, X., Panda, D.K.D.K.: A comprehensive study of MapReduce over Lustre for intermediate data placement and shuffle strategies on HPC clusters. IEEE Trans. Parallel Distrib. Syst. 28, 633–646 (2017)

    Google Scholar 

  26. Usman, S., Mehmood, R., Katib, I.: Big data and HPC convergence: the cutting edge and outlook. In: Smart Societies, Infrastructure, Technologies and Applications, pp. 11–26. Springer (2018)

  27. Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32, 435–479 (2018)

    Google Scholar 

  28. The convergence of big data and extreme-scale HPC. https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/. Accessed 22 Sep 2018

  29. Luckow, A., Paraskevakos, I., Chantzialexiou, G., Jha, S.: Hadoop on HPC: integrating Hadoop and pilot-based dynamic resource management. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1607–1616 (2016)

  30. Ross, R.B., Thakur, R., et al.: PVFS: a parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 391–430 (2000)

  31. Nagle, D., Serenyi, D., Matthews, A.: The Panasas ActiveScale storage cluster: delivering scalable high bandwidth storage. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 53 (2004)

  32. Eisler, M., Labiaga, R., Stern, H.: Managing NFS and NIS: Help for Unix System Administrators. O’Reilly Media, Inc., Newton (2001)

    Google Scholar 

  33. Schwan, P., et al.: Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux Symposium, vol. 2003, pp. 380–386 (2003)

  34. Schmuck, F.B., Haskin, R.L.: GPFS: a shared-disk file system for large computing clusters. In: FAST, vol. 2 (2002)

  35. Gu, Y., Grossman, R.L., Szalay, A., Thakar, A.: Distributing the Sloan digital sky survey using UDT and sector. In: Second IEEE International Conference on e-Science and Grid Computing, 2006. e-Science’06, p. 56 (2006)

  36. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)

  37. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM 37 (2003). https://doi.org/10.1145/1165389.945450

  38. OpenMP. https://www.openmp.org. Accessed 20 Aug 2018

  39. MPICH. https://www.mpich.org. Accessed 20 Aug 2018

  40. MVAPICH. http://mvapich.cse.ohio-state.edu. Accessed 20 Aug 2018

  41. Exascale MPI. https://www.exascaleproject.org/project/exascale-mpi/. Accessed 2 Feb 2019

  42. OMPI-X. https://www.exascaleproject.org/project/ompi-x-open-mpi-exascale/. Accessed 2 Feb 2019

  43. OpenACC. https://www.openacc.org. Accessed 2 Feb 2019

  44. Zhang, F., et al.: CloudFlow: a data-aware programming model for cloud workflow applications on modern HPC systems. Future Gener. Comput. Syst. 51, 98–110 (2015)

    Google Scholar 

  45. Venkata, M.G., Aderholdt, F., Parchman, Z.: SharP: Towards programming extreme-scale systems with hierarchical heterogeneous memory. In: 2017 46th International Conference on Parallel Processing Workshops (ICPPW), pp. 145–154 (2017)

  46. Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: MARIANE: using MapReduce in HPC environments. Future Gener. Comput. Syst. 36, 379–388 (2014)

    Google Scholar 

  47. Luckow, A., et al.: P*: a model of pilot-abstractions. CoRR (2012). http://arxiv.org/abs/1207.6644

  48. Neves, M.V., Ferreto, T., De Rose, C.: Scheduling MapReduce jobs in HPC clusters. In: Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par 2012, Proceedings, pp. 179–190. Springer, Berlin (2012)

  49. Sato, K., et al.: A user-level InfiniBand-based file system and checkpoint strategy for burst buffers. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 21–30 (2014). https://doi.org/10.1109/ccgrid.2014.24

  50. Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener. Comput. Syst. 22, 303–312 (2006)

    Google Scholar 

  51. Pcocc. https://pcocc.readthedocs.io/en/latest/. Accessed 8 March 2019

  52. TrinityX. https://trinityx.eu. Accessed 8 March 2019

  53. OpenStack. https://www.openstack.org/. Accessed 8 March 2019

  54. Docker. https://www.docker.com. Accessed 8 March 2019

  55. Slurm elastic computing. https://slurm.schedmd.com/elastic_computing.html. Accessed 8 March 2019

  56. Xen. https://xenproject.org. Accessed 8 March 2019

  57. VMware. https://www.vmware.com. Accessed 8 March 2019

  58. KVM. https://www.linux-kvm.org. Accessed 8 March 2019

  59. VirtualBox. https://www.virtualbox.org. Accessed 8 March 2019

  60. Regola, N., Ducom, J.-C.: Recommendations for virtualization technologies in high performance computing. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 409–416 (2010)

  61. Biederman, E.W., Networx, L.: Multiple instances of the global Linux namespaces. Proc. Linux Symp. 1, 101–112 (2006)

    Google Scholar 

  62. Cgroups. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt. Accessed 10 March 2019

  63. Linux containers. https://linuxcontainers.org. Accessed 10 March 2019

  64. Linux-VServer. www.linux-vserver.org. Accessed 10 March 2019

  65. OpenVZ. https://openvz.org. Accessed 10 March 2019

  66. LXD Linux containers. https://linuxcontainers.org/lxd/introduction. Accessed 10 March 2019

  67. rkt-CoreOS. https://coreos.com/rkt/. Accessed 10 March 2019

  68. Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017)

    Google Scholar 

  69. Shifter. https://docs.nersc.gov/programming/shifter/overview/. Accessed 14 March 2019

  70. Priedhorsky, R., Randles, T.: Charliecloud: unprivileged containers for user-defined software stacks in HPC. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 36 (2017

  71. Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. ACM SIGOPS Oper. Syst. Rev. 41, 275–287 (2007)

    Google Scholar 

  72. Julian, S., Shuey, M., Cook, S.: Containers in research: initial experiences with lightweight infrastructure. In: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, p. 25 (2016)

  73. Kozhirbayev, Z., Sinnott, R.O.: A performance comparison of container-based technologies for the cloud. Future Gener. Comput. Syst. 68, 175–182 (2017)

    Google Scholar 

  74. Medrano-Jaimes, F., Lozano-Rizk, J.E., Castañeda-Avila, S., Rivera-Rodriguez, R.: Use of containers for high-performance computing. In: International Conference on Supercomputing in Mexico, pp. 24–32 (2018)

  75. Martin, J.P., Kandasamy, A., Chandrasekaran, K.: Exploring the support for high performance applications in the container runtime environment. Hum. Centric Comput. Inf. Sci. 8, 1 (2018)

    Google Scholar 

  76. Shafer, J.: I/O virtualization bottlenecks in cloud computing today. In: Proceedings of the 2nd Conference on I/O Virtualization, p. 5 (2010)

  77. Yassour, B.-A., Ben-Yehuda, M., Wasserman, O.: Direct Device Assignment for Untrusted Fully-Virtualized Virtual Machines. IBM, Haifa (2008)

    Google Scholar 

  78. Liu, J., Huang, W., Abali, B., Panda, D.K.: High performance VMM-bypass I/O in virtual machines. In: USENIX Annual Technical Conference, General Track, pp. 29–42 (2006)

  79. SR-IOV. http://pcisig.com/specifications/iov/single_root/. Accessed 14 March 2019

  80. Gugnani, S., Lu, X., Panda, D.K.: Performance characterization of Hadoop workloads on SR-IOV-enabled virtualized InfiniBand clusters. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 36–45 (2016)

  81. Hillenbrand, M., Mauch, V., Stoess, J., Miller, K., Bellosa, F.: Virtual InfiniBand clusters for HPC clouds. In: Proceedings of the 2nd International Workshop on Cloud Computing Platforms, p. 9 (2012)

  82. Nicolae, B., Cappello, F.: BlobCR: virtual disk based checkpoint–restart for HPC applications on IaaS clouds. J. Parallel Distrib. Comput. 73, 698–711 (2013)

    Google Scholar 

  83. Ren, J., Qi, Y., Dai, Y., Xuan, Y., Shi, Y.: nOSV: a lightweight nested-virtualization VMM for hosting high performance computing on cloud. J. Syst. Softw. 124, 137–152 (2017)

    Google Scholar 

  84. Zhang, J., Lu, X., Chakraborty, S., Panda, D.K. Slurm-V: extending Slurm for building efficient HPC cloud with SR-IOV and IVShmem. In: European Conference on Parallel Processing, pp. 349–362 (2016)

  85. Duran-Limon, H.A., Flores-Contreras, J., Parlavantzas, N., Zhao, M., Meulenert-Peña, A.: Efficient execution of the WRF model and other HPC applications in the cloud. Earth Sci. Inform. 9, 365–382 (2016)

    Google Scholar 

  86. Duran-Limon, H.A., Siller, M., Blair, G.S., Lopez, A., Lombera-Landa, J.F.: Using lightweight virtual machines to achieve resource adaptation in middleware. IET Softw. 5, 229–237 (2011)

    Google Scholar 

  87. Yang, C.-T., Wang, H.-Y., Ou, W.-S., Liu, Y.-T., Hsu, C.-H.: On implementation of GPU virtualization using PCI pass-through. In: 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, pp. 711–716 (2012)

  88. Jo, H., Jeong, J., Lee, M., Choi, D.H.: Exploiting GPUs in virtual machine for BioCloud. Biomed. Res. Int. (2013). https://doi.org/10.1155/2013/939460

    Article  Google Scholar 

  89. Prades, J., Reaño, C., Silla, F.: On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines. Clust. Comput. 22, 185–204 (2019)

    Google Scholar 

  90. Mavridis, I., Karatza, H.: Combining containers and virtual machines to enhance isolation and extend functionality on cloud computing. Future Gener. Comput. Syst. 94, 674–696 (2019)

    Google Scholar 

  91. Gad, R., et al.: Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments. J. Supercomput. 74, 6236–6257 (2018)

    Google Scholar 

  92. Trusted Computing Group. https://trustedcomputinggroup.org. Accessed 27 Feb 2019

  93. Goldman, K., Sailer, R., Pendarakis, D., Srinivasan, D.: Scalable integrity monitoring in virtualized environments. In: Proceedings of the Fifth ACM Workshop on Scalable Trusted Computing, pp. 73–78 (2010)

  94. Zhang, J., Lu, X., Panda, D.K.: Is singularity-based container technology ready for running MPI applications on HPC clouds? In: Proceedings of the 10th International Conference on Utility and Cloud Computing, pp. 151–160 (2017)

  95. De Benedictis, M., Lioy, A.: Integrity verification of Docker containers for a lightweight cloud environment. Future Gener. Comput. Syst. 97, 236–246 (2019)

    Google Scholar 

  96. Costan, V., Devadas, S.: Intel SGX explained. IACR Cryptol. ePrint Arch. 2016, 86 (2016)

    Google Scholar 

  97. Arnautov, S., et al.: SCONE: secure Linux containers with Intel SGX. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI16), pp. 689–703 (2016)

  98. Sailer, R., Zhang, X., Jaeger, T., Van Doorn, L.: Design and implementation of a TCG-based integrity measurement architecture. In: USENIX Security Symposium, vol. 13, pp. 223–238 (2004)

  99. Sun, Y., et al.: Security namespace: making Linux security frameworks available to containers. In: 27th USENIX Security Symposium USENIX Security 18, pp. 1423–1439 (2018)

  100. AppArmor. https://www.novell.com/developer/ndk/novell_apparmor.html. Accessed 27 Feb 2019

  101. Bézivin, J.: On the unification power of models. Softw. Syst. Model. 4, 171–188 (2005)

    Google Scholar 

  102. Paraiso, F., Challita, S., Al-Dhuraibi, Y., Merle, P.: Model-driven management of docker containers. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 718–725 (2016)

  103. Pérez, A., Moltó, G., Caballer, M., Calatrava, A.: Serverless computing for container-based architectures. Future Gener. Comput. Syst. 83, 50–59 (2018)

    Google Scholar 

  104. AWS Lambda. https://aws.amazon.com/lambda. Accessed 1 March 2019

  105. Medel, V., et al.: Client-side scheduling based on application characterization on Kubernetes. In: Pham, C., Altmann, J., Bañares, J.Á. (eds.) Economics of Grids, Clouds, Systems, and Services, pp. 162–176. Springer, Cham (2017)

    Google Scholar 

  106. Yang, X., Liu, N., Feng, B., Sun, X.-H., Zhou, S.: PortHadoop: support direct HPC data processing in Hadoop. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 223–232 (2015)

  107. Ruan, G., Plale, B.: Horme: random access big data analytics. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 364–373 (2016)

  108. McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems, pp. 165–172 (2013)

  109. Ren, K., Zheng, Q., Patil, S., Gibson, G.: IndexFS: scaling file system metadata performance with stateless caching and bulk insertion. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 237–248 (2014)

  110. Takatsu, F., Hiraga, K., Tatebe, O.: PPFS: a scale-out distributed file system for post-petascale systems. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications, pp. 1477–1484 (2016)

  111. Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 101–110 (2015)

  112. Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable Internet services. ACM SIGOPS Oper. Syst. Rev. 35, 230–243 (2001)

    Google Scholar 

  113. Wasi-ur-Rahman, M., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA. In: Parallel and Distributed Processing Symposium (IPDPS), 2015, pp. 291–300. IEEE International (2015)

  114. Rahman, M.W., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: MapReduce over Lustre: can RDMA-based approach benefit? In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing: 20th International Conference. Proceedings, Porto, Portugal, 25–29 August 2014, pp. 644–655. Springer (2014)

  115. Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–15 (2014)

  116. Zhao, D., et al.: FusionFS: toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 61–70 (2014)

  117. Xuan, P., Ligon, W.B., Srimani, P.K., Ge, R., Luo, F.: Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput. 61, 18–34 (2017)

    MathSciNet  Google Scholar 

  118. Raynaud, T., Haque, R., Ait-Kaci, H.: CedCom: a high-performance architecture for Big Data applications. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 621–632 (2014)

  119. Cheng, P., Lu, Y., Du, Y., Chen, Z.: Experiences of converging big data analytics frameworks with high performance computing systems. In: Yokota, R., Wu, W. (eds.) Supercomputing Frontiers, pp. 90–106. Springer (2018)

  120. Bhimji, W., et al.: Accelerating Science with the NERSC Burst Buffer Early User Program. Lawrence National Laboratory, Berkeley (2016)

    Google Scholar 

  121. Wang, T., Oral, S., Pritchard, M., Vasko, K., Yu, W.: Development of a burst buffer system for data-intensive applications. arXiv Prepr. arXiv1505.01765 (2015)

  122. Henseler, D., Landsteiner, B., Petesch, D., Wright, C., Wright, N.J.: Architecture and design of Cray DataWarp. In: Cray User Group, CUG (2016)

  123. Wang, T., Mohror, K., Moody, A., Sato, K., Yu, W.: An ephemeral burst-buffer file system for scientific applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 69 (2016)

  124. Tang, K., et al.: Toward managing HPC burst buffers effectively: draining strategy to regulate bursty I/O behavior. In: 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 87–98 (2017)

  125. UnifyCR. https://www.exascaleproject.org/project/unifycr-file-system-burst-buffers/. Accessed 22 2019

  126. Islam, N.S., Shankar, D., Lu, X., Wasi-Ur-Rahman, M., Panda, D.K.: Accelerating I/O performance of big data analytics on HPC clusters through RDMA-based key-value store. In: 2015 44th International Conference on Parallel Processing, pp. 280–289 (2015)

  127. Wang, Y., Goldstone, R., Yu, W., Wang, T.: Characterization and optimization of memory-resident MapReduce on HPC systems. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 799–808 (2014)

  128. Yildiz, O., Zhou, A.C., Ibrahim, S.: Improving the effectiveness of burst buffers for big data processing in HPC systems with Eley. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 87–91 (2017)

  129. Yildiz, O., Zhou, A.C., Ibrahim, S.: Improving the effectiveness of burst buffers for big data processing in HPC systems with Eley. Future Gener. Comput. Syst. (2018). https://doi.org/10.1016/j.future.2018.03.029

    Article  Google Scholar 

  130. Chaimov, N., et al.: Scaling Spark on HPC systems. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, pp. 97–110 (2016)

  131. Islam, N.S., Wasi-ur-Rahman, M., Lu, X., Panda, D.K.: High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing, p. 8 (2016)

  132. Wang, T., et al.: BurstMem: a high-performance burst buffer system for scientific applications. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 71–79 (2014)

  133. Hadoop workload analysis. http://www.pdl.cmu.edu/HLA/index.shtml. Accessed 27 Feb 2018

  134. Liu, N., et al.: On the role of burst buffers in leadership-class storage systems. In 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11 (2012). https://doi.org/10.1109/msst.2012.6232369

  135. Wasi-ur-Rahman, M., Islam, N.S., Lu, X., Panda, D.K.: NVMD: non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems. In: IEEE International Conference on Big Data (Big Data), pp. 369–374 (2017)

  136. Moving computation is cheaper than moving data. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 22 Sep 2018

  137. Liu, Q., et al.: Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurr. Comput. Pract. Exp. 26, 1453–1473 (2014)

    Google Scholar 

  138. Klasky, S., et al.: In situ data processing for extreme-scale computing. In: Proceedings of SciDAC (2011)

  139. ALPINE Project. https://www.exascaleproject.org/project/alpine-algorithms-infrastructure-situ-visualization-analysis/. Accessed 7 Feb 2019

  140. Foster, I., et al.: Computing just what you need: online data analysis and reduction at extreme scales. In: European Conference on Parallel Processing, pp. 3–19 (2017)

  141. Mackey, G., Sehrish, S., Mitchell, C., Bent, J., Wang, J.: USFD: a unified storage framework for SOAR HPC scientific workflows. Int. J. Parallel Emerg. Distrib. Syst. 27, 347–367 (2012)

    Google Scholar 

  142. EZ. https://www.exascaleproject.org/project/ez-fast-effective-parallel-error-bounded-exascale-lossy-compression-scientific-data/. Accessed 7 Feb 2019

  143. Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1129–1139 (2017)

  144. Son, S.W., Sehrish, S., Liao, W., Oldfield, R., Choudhary, A.: Reducing I/O variability using dynamic I/O path characterization in petascale storage systems. J. Supercomput. 73, 2069–2097 (2017)

    Google Scholar 

  145. Wang, T., Oral, S., Pritchard, M., Wang, B., Yu, W.: TRIO: burst buffer based I/O orchestration. In: 2015 IEEE International Conference on Cluster Computing, pp. 194–203 (2015)

  146. Kougkas, A., Dorier, M., Latham, R., Ross, R., Sun, X.-H.: Leveraging burst buffer coordination to prevent I/O interference. In: 2016 IEEE 12th International Conference on e-Science (e-Science), pp. 371–380 (2016)

  147. Zhang, X., Jiang, S., Diallo, A., Wang, L.: IR+: removing parallel I/O interference of MPI programs via data replication over heterogeneous storage devices. Parallel Comput. 76, 91–105 (2018)

    Google Scholar 

  148. Han, J., et al.: Accelerating a burst buffer via user-level I/O isolation. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 245–255 (2017)

  149. Xu, C., et al.: Exploiting analytics shipping with virtualized MapReduce on HPC backend storage servers. IEEE Trans. Parallel Distrib. Syst. 27, 185–196 (2016)

    Google Scholar 

  150. da Silva, R.F., Callaghan, S., Deelman, E.: On the use of burst buffers for accelerating data-intensive scientific workflows. In: Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science, p. 2 (2017)

  151. Dreher, M., Raffin, B.: A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 277–286 (2014)

  152. Malitsky, N.: Bringing the HPC reconstruction algorithms to Big Data platforms. In: 2016 New York Scientific Data Summit (NYSDS), pp. 1–8 (2016)

  153. OpenFabrics. http://www.openfabrics.org/. Accessed 22 Sep 2018

  154. Wasi-ur-Rahman, M., et al.: High-performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 1908–1917 (2013)

  155. Rahman, M.W., Lu, X., Islam, N.S., Panda, D.K.: HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM International Conference on Supercomputing, pp. 33–42 (2014)

  156. High Performance Data Analytics: Experiences of Porting the Apache Hama Graph Analytics Framework to an HPC InfiniBand Connected Cluster (White Paper). https://gdmissionsystems.com/-/media/General-Dynamics/Cyber-and-Electronic-Warfare-Systems/PDF/Brochures/high-performance-data-analytics-whitepaper-2015.ashx

  157. Li, M., Lu, X., Hamidouche, K., Zhang, J., Panda, D.K.: Mizan-RMA: accelerating Mizan graph processing framework with MPI RMA. In: IEEE 23rd International Conference on High Performance Computing (HiPC), 42–51 (2016)

  158. Li, M., et al.: Designing MPI library with on-demand paging (ODP) of InfiniBand: challenges and benefits. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 433–443 (2016)

  159. Lu, X., Wang, B., Zha, L., Xu, Z.: Can MPI benefit Hadoop and MapReduce applications? In: 2011 40th International Conference on Parallel Processing Workshops, pp. 371–379 (2011)

  160. Wang, Y., Xu, C., Li, X., Yu, W.: JVM-bypass for efficient Hadoop shuffling. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 569–578 (2013)

  161. Sur, S., Wang, H., Huang, J., Ouyang, X., Panda, D.K.: Can high-performance interconnects benefit Hadoop distributed file system? In: Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds (MASVDC). Held in Conjunction with MICRO (2010)

  162. Jose, J., et al.: Memcached design on high performance RDMA capable interconnects. In: 2011 International Conference on Parallel Processing, pp. 743–752 (2011)

  163. Jose, J., Luo, M., Sur, S., Panda, D.K.: Unifying UPC and MPI runtimes: experience with MVAPICH. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, p. 5 (2010)

  164. Islam, N.S., et al.: High performance RDMA-based design of HDFS over InfiniBand. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35 (2012)

  165. Huang, J., et al.: High-performance design of HBase with RDMA over InfiniBand. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 774–785 (2012)

  166. Lu, X., et al.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: 2013 42nd International Conference on Parallel Processing, pp. 641–650 (2013)

  167. Islam, N.S., Lu, X., Rahman, M.W., Panda, D.K.: SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 261–264 (2014)

  168. Lu, X., Rahman, M.W.U., Islam, N., Shankar, D., Panda, D.K.: Accelerating Spark with RDMA for big data processing: early experiences. In: 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects, pp. 9–16 (2014)

  169. Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Panda, D.K.: Can parallel replication benefit Hadoop distributed file system for high performance interconnects? In: 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, pp. 75–78 (2013)

  170. Katevenis, M., et al.: Next generation of Exascale-class systems: ExaNeSt Project and the status of its interconnect and storage development. Microprocess. Microsyst. 61, 58–71 (2018)

    Google Scholar 

  171. Zahid, F., Gran, E.G., Bogdański, B., Johnsen, B.D., Skeie, T.: Efficient network isolation and load balancing in multi-tenant HPC clusters. Future Gener. Comput. Syst. 72, 145–162 (2017)

    Google Scholar 

  172. Wang, J., et al.: SideIO: a side I/O system framework for hybrid scientific workflow. J. Parallel Distrib. Comput. 108, 45–58 (2017)

    Google Scholar 

  173. Huang, D., et al.: UNIO: a unified I/O system framework for hybrid scientific workflow. In: Second International Conference on Cloud Computing and Big Data in Asia, pp. 99–114 (2015)

  174. Hadoop on demand. https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.17.1/docs/hod.html. Accessed 22 Sep 2018

  175. Magpie. https://github.com/LLNL/magpie. Accessed 22 Sep 2018

  176. Moody, W.C., Ngo, L.B., Duffy, E., Apon, A.: JUMMP: job uninterrupted maneuverable MapReduce platform. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–8 (2013)

  177. Krishnan, S., Tatineni, M., Baru, C.: myHadoop-Hadoop-on-Demand on Traditional HPC Resources. San Diego Supercomputer Center Technical Report. TR-2011-2. University of California, San Diego (2011)

  178. Lu, T., et al.: Canopus: a paradigm shift towards elastic extreme-scale data analytics on HPC storage. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 58–69 (2017)

  179. EXAHDF5. https://www.exascaleproject.org/project/exahdf5-delivering-efficient-parallel-o-exascale-computing-systems/. Accessed 7 Feb 2019

  180. Mercier, M., Glesser, D., Georgiou, Y., Richard, O.: Big data and HPC collocation: using HPC idle resources for Big Data analytics. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 347–352 (2017). https://doi.org/10.1109/bigdata.2017.8257944

  181. Turilli, M., Santcroos, M., Jha, S.: A comprehensive perspective on the pilot-job abstraction. CoRR (2015). http://arxiv.org/abs/1508.04180

  182. Merzky, A., Santcroos, M., Turilli, M., Jha, S.: RADICAL-Pilot: scalable execution of heterogeneous and dynamic workloads on supercomputers. CoRR (2015). http://arxiv.org/abs/1512.08194

  183. Merzky, A., Weidner, O., Jha, S.: SAGA: a standardized access layer to heterogeneous distributed computing infrastructure. SoftwareX 1, 3–8 (2015)

    Google Scholar 

  184. SAGA-Hadoop. https://github.com/drelu/saga-hadoop. Accessed 22 Sep 2018

  185. Rahman, M.W., Islam, N.S., Lu, X., Shankar, D., Panda, D.K.: MR-Advisor: a comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters. J. Parallel Distrib. Comput. 120, 237–250 (2018)

    Google Scholar 

  186. Jin, H., Ji, J., Sun, X.-H., Chen, Y., Thakur, R.: CHAIO: enabling HPC applications on data-intensive file systems. In: 2012 41st International Conference on Parallel Processing, pp. 369–378 (2012)

  187. Aupy, G., Gainaru, A., Le Fèvre, V.: Periodic I/O scheduling for super-computers. In: International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, pp. 44–66 (2017)

  188. Gao, C., Ren, R., Cai, H.: GAI: a centralized tree-based scheduler for machine learning workload in large shared clusters. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 611–629 (2018)

  189. Ekanayake, S., Kamburugamuve, S., Fox, G.C.: SPIDAL Java: high performance data analytics with Java and MPI on large multicore HPC clusters. In: Proceedings of 24th High Performance Computing Symposium (2016)

  190. NVIDIA NCCL. https://developer.nvidia.com/nccl. Accessed 22 Sep 2018

  191. Wickramasinghe, U.S., Bronevetsky, G., Lumsdaine, A., Friedley, A.: Hybrid MPI: a case study on the Xeon Phi platform. In: ACM Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, pp. 6:1–6:8 (2014)

  192. DATALIB. https://www.exascaleproject.org/project/datalib-data-libraries-services-enabling-exascale-science/. Accessed 7 Feb 2019

  193. Gittens, A., et al.: Matrix factorizations at scale: a comparison of scientific data analytics in Spark and C +MPI using three case studies. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 204–213 (2016). https://doi.org/10.1109/bigdata.2016.7840606

  194. Jha, S., Qiu, J., Luckow, A., Mantha, P., Fox, G.C.: A tale of two data-intensive paradigms: applications, abstractions, and architectures. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 645–652 (2014)

  195. Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015)

    Google Scholar 

  196. Anderson, M., et al.: Bridging the gap between HPC and Big Data frameworks. Proc. VLDB Endow. 10, 901–912 (2017)

    Google Scholar 

  197. Guo, Y., Bland, W., Balaji, P., Zhou, X.: Fault tolerant MapReduce-MPI for HPC clusters. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 34 (2015)

  198. SCR. https://computation.llnl.gov/projects/scalable-checkpoint-restart-for-mpi. Accessed 22 Sep 2018

  199. Moody, A., Bronevetsky, G., Mohror, K., De Supinski, B.R.: Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)

  200. Rajachandrasekar, R., Moody, A., Mohror, K., Panda, D.K.: A 1 PB/s file system to checkpoint three million MPI tasks. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 143–154 (2013)

  201. VeloC. https://www.exascaleproject.org/project/veloc-low-overhead-transparent-multilevel-checkpoint-restart/. Accessed 7 Feb 2019

  202. You, Y., et al.: Scaling support vector machines on modern HPC platforms. J. Parallel Distrib. Comput. 76, 16–31 (2015)

    Google Scholar 

  203. TeraSort. http://sortbenchmark.org. Accessed 22 Sep 2018

  204. Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce benchmarks suite (2012)

  205. IOZone benchmark. http://www.iozone.org. Accessed 22 Sep 2018

  206. Shan, H., Shalf, J.: Using IOR to analyze the I/O performance for HPC platforms. In: Cray User Group Conference 2007, Seattle, WA, USA (2007)

  207. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: New Frontiers in Information and Software as Services, pp. 209–228 (2011)

  208. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51 (2010)

  209. Gao, W., et al.: BigDataBench: a dwarf-based big data and AI benchmark suite. CoRR (2018). http://arxiv.org/abs/1802.08254

  210. OSU HiBD-benchmark. http://hibd.cse.ohio-state.edu. Accessed 22 Sep 2018

  211. HPL—a portable implementation of the high-performance Linpack benchmark for distributed-memory computers. http://www.netlib.org/benchmark/hpl/

  212. Graph500. https://graph500.org/. Accessed 22 Sep 2018

  213. BLAST. https://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessed 22 Sep 2018

  214. GridMix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html. Accessed 22 Sep 2018

  215. Parallel Workload Archive. http://www.cs.huji.ac.il/labs/parallel/workload/. Accessed 22 Sep 2018

  216. Albrecht, J.: Challenges for the LHC Run 3: Computing and Algorithms. (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ajeet Ram Pathak.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pathak, A.R., Pandey, M. & Rautaray, S.S. Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation. Cluster Comput 23, 953–988 (2020). https://doi.org/10.1007/s10586-019-02960-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02960-y

Keywords

Navigation