Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

Ahmad, Awais; Paul, Anand; Din, Sadia; Rathore, M. Mazhar; Choi, Gyu Sang; Jeon, Gwanggil

doi:10.1007/s10766-017-0498-x

Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

Published: 27 March 2017

Volume 46, pages 508–527, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Awais Ahmad¹,
Anand Paul²,
Sadia Din²,
M. Mazhar Rathore²,
Gyu Sang Choi¹ &
…
Gwanggil Jeon³

1238 Accesses
39 Citations
3 Altmetric
Explore all metrics

Abstract

The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where High-Performance Computing solution has become a key issue and has attracted attention in recent years. However, these systems are either memoryless or computational inefficient. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that enhances the working of traditional MapReduce by incorporating parallel processing algorithm. Moreover, complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed parallel processing algorithm. The proposed system architecture both read and writes operations that enhance the efficiency of the Input/Output operation. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Salman Salloum, Ruslan Dautov, … Joshua Zhexue Huang

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Salvador García, Sergio Ramírez-Gallego, … Francisco Herrera

Big data analytics: a survey

Article Open access 01 October 2015

Chun-Wei Tsai, Chin-Feng Lai, … Athanasios V. Vasilakos

References

Ahmad, A., Paul, A., Rathore, M.M.: An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication. Neurocomputing 174, 439–453 (2016)
Article Google Scholar
NOAA. Overview of Current Atmospheric Reanalysis. http://reanalyses.org/atmosphere/overview-current-reanalyses (2016)
Ahmad, A., Paul, A., Rathore, M., Chang, H.: An efficient multidimensional big data fusion approach in machine-to-machine communication. ACM Trans. Embed. Comput. Syst. (TECS) 15(2), 39 (2016)
Google Scholar
Rathore, M.M., Ullah, A.P., Ahmad, A., Chen, B.-W., Huang, B., Ji, W.: Real-time big data analytical architecture for remote sensing application. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(10), 4610–4621 (2015)
Article Google Scholar
Haderer, N., Romain, R., Seinturier, L.: Dynamic deployment of sensing experiments in the wild using smartphones. In: IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 43–56. Springer, Berlin, Heidelberg (2013)
Mosser, S., Fleurey, F., Morin, B., Chauvel, F., Solberg, A., Goutier, I.: Sensapp as a reference platform to support cloud experiments: from the internet of things to the internet of services. In: 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 400–406. IEEE (2012)
Mosser, S., Logre, I., Ferry, N., Collet, P.: From sensors to visualization dashboards: need for language composition. In: Globalization of Modeling Languages workshop (GeMOC’13) (2013)
Awais, A., Paul, A., Rathore, M.M., Chang, H.: Smart cyber society: integration of capillary devices with high usability based on cyber–physical system. Future Gen. Comput. Syst. 56, 493–503 (2016)
Article Google Scholar
Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012)
Article Google Scholar
Chen, C., Lang, M., Chen, Y.: Multilevel active storage for big data applications in high performance computing. In: 2013 IEEE International Conference on Big Data, pp. 169–174. IEEE (2013)
Felix, E.J., Fox, K., Regimbal, K., Nieplocha, J.: Active storage processing in a parallel file system. In: Proceedings of the 6th LCI International Conference on Linux Clusters: The HPC Revolution, p. 85 (2006)
Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: The Seventh Symposium on the Frontiers of Massively Parallel Computation, 1999. Frontiers’ 99, pp. 182–189. IEEE (1999)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)
Article Google Scholar
Yoo, R.M., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 198–207. IEEE (2009)
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13–24. IEEE (2007)
Rafique, M.M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM SIGOPS Oper. Syst. Rev. 43(2), 25–34 (2009)
Article Google Scholar
Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. AcM sIGMoD Rec. 40(4), 11–20 (2012)
Article Google Scholar
Shim, K.: MapReduce algorithms for big data analysis. Proc. VLDB Endow. 5(12), 2016–2017 (2012)
Article Google Scholar
Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow. 2(2), 1426–1437 (2009)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Article Google Scholar
Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience, 2008. eScience’08, pp. 277–284. IEEE (2008)
Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Exploiting encrypted and tunneled multimedia calls in high-speed big data environment. Multimed. Tools Appl. 1–26 (2017)
Paul, A., Ahmad, A., Rathore, M.M., Jabbar, S.: Smartbuddy: defining human behaviors using big data analytics in social internet of things. IEEE Wirel. Commun. 23(5), 68–74 (2016)
Article Google Scholar
Rathore, M.M., Paul, A., Ahmad, A., Jeon, G.: IoT-based big data: from smart city towards next generation super city planning. Int. J. Semant. Web Inf. Syst. 13(1), 28–47 (2017)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Stampede at TACC. http://www.tacc.utexas.edu/resources/hpc/stampede
Gordon at San Diego Supercomputer Center. http://www.sdsc.edu/us/resources/gordon/
Gropp, W., Lusk, E., Sterling, T.: Enabling Technologies in Beowulf Cluster Computing with Linux, 2nd edn, vol. 3. The MIT Press, Cambridge, MA, London, England, p. 14 (2003)
Sterling, T.L., Salmon, J., Becker, D.J., Savarese, D.F.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge, MA (1999)
Google Scholar
Engelmann, C., Ong, H., Scott, S.L.: Middleware in modern high performance computing system architectures. In: International Conference on Computational Science, pp. 784–791. Springer, Berlin, Heidelberg (2007)
Castain, R.H., Kulkarni, O.: MapReduce and Lustre: Running Hadoop in a High Performance Computing Environment. https://intel.activeevents.com/sf13/connect/sessionDetail.ww?SESSIONID=1141
Wasi-ur Rahman, Md., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: MapReduce over Lustre: Can RDMA-Based Approach Benefit? In: tEuropean Conference on Parallel Processing, pp. 644–655. Springer, Berlin (2014)
Wasi-ur-Rahman, Md., Islam, N.S., Lu, X., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 1908–1917. IEEE (2013)
Wasi-ur Rahman, Md., Lu, X., Islam, N.S., Panda, D.K.: HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM international conference on Supercomputing, pp. 33–42. ACM (2014)
Lu, X., Islam, N.S., Wasi-Ur-Rahman, Md., Jose, J., Subramoni, H., Wang, H., Panda, D.K.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: 2013 42nd International Conference on Parallel Processing, pp. 641–650. IEEE (2013). doi:10.1109/ICPP.2013.78
Available online: 14/10/2014, 2312. https://earth.esa.int/
ESA: ENVISAT Altimetry Level 2 User Manual V1.4 2011. [Available online: 15/10/2014, 0333] https://earth.esa.int/pub/ESA_DOC/ENVISAT/RA2-MWR/PH_light_1rev4_ESA.pdf

Download references

Acknowledgements

This work is supported by BK21 Plus project (SW Human Resource Development Program for Supporting Smart Life) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (21A20131600005) and NRF Grant funded by the Korean Government (NRF-2015R1D1A1A01058171).

Author information

Authors and Affiliations

Department of Information and Communication Engineering, Yeungnam University, Gyeongbuk, Republic of Korea
Awais Ahmad & Gyu Sang Choi
School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea
Anand Paul, Sadia Din & M. Mazhar Rathore
Department of Embedded Systems Engineering, Incheon National University, Incheon, Korea
Gwanggil Jeon

Authors

Awais Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Anand Paul
View author publications
You can also search for this author in PubMed Google Scholar
Sadia Din
View author publications
You can also search for this author in PubMed Google Scholar
M. Mazhar Rathore
View author publications
You can also search for this author in PubMed Google Scholar
Gyu Sang Choi
View author publications
You can also search for this author in PubMed Google Scholar
Gwanggil Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Awais Ahmad or Gwanggil Jeon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmad, A., Paul, A., Din, S. et al. Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing. Int J Parallel Prog 46, 508–527 (2018). https://doi.org/10.1007/s10766-017-0498-x

Download citation

Received: 26 December 2016
Accepted: 11 March 2017
Published: 27 March 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10766-017-0498-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

Big data analytics: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

Big data analytics: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation