Abstract
Hadoop is one of the biggest software structures for distributing the data to compute and handle big data. Big data is a group of composite and enormous datasets that contains a massive amount of data such as real-time data, social media, capabilities of data management, money laundering, and so on. Also, big data is measured as regards terabytes and petabytes. The main issue of the Hadoop application is unauthorized access. There are several existing techniques introduced to secure the data, but they have data errors, malicious attacks, and take a long time to compute. So the author proposed a novel ChaApache framework to secure the Hadoop application from an unauthorized person also to save processing time of data, and reduce the error rate. The main aim of the developed replica is securing data from an unauthorized person or unauthorized access. Moreover, the developed ChaApache framework is implemented in python, and the Hadoop application contains 512 bits of data, and the data are encrypted by four 32 bits. Furthermore, the proposed model is compared with other existing replicas in terms of computation time, resource usage, data sharing rate, encryption speed, and so on.
Similar content being viewed by others
References
Ali F, El-Sappagh S, Islam SMR, Ali A, Attique M, Imran M, Kwak KS (2021) An intelligent healthcare monitoring framework using wearable sensors and social networking data. Futur Gener Comput Syst 114:23–43. https://doi.org/10.1016/j.future.2020.07.047
Awaysheh FM, Alazab M, Gupta M, Pena TF, Cabaleiro JC (2020) Next-generation big data federation access control: a reference model. Futur Gener Comput Syst 108:726–741. https://doi.org/10.1016/j.future.2020.02.052
Awaysheh FM, Aladwan MN, Alazab M, Alawadi S, Cabaleiro JC, Pena TF (2021) Security by Design for big Data Frameworks over Cloud Computing. IEEE Trans Eng Manag:1–18. https://doi.org/10.1109/TEM.2020.3045661
Bao K, Ding Y (2020) Network security analysis using big data technology and improved neural network. J Ambient Intell Humaniz Comput:1–11. https://doi.org/10.1007/s12652-020-02080-1
Begum G, Huq SZU, Kumar APS (2020) Sandbox security model for Hadoop file system. J Big Data 7(1):1–10. https://doi.org/10.1186/s40537-020-00356-z
Bide P, Padalkar A (2020) Survey on Diabetes Mellitus and incorporation of Big data, Machine Learning and IoT to mitigate it. 2020 6th international conference on advanced computing and communication systems (ICACCS), IEEE. https://doi.org/10.1109/ICACCS48705.2020.9074202
Cattaneo G, Ferraro Petrillo U, Abate AF, Narducci F, Barra S (2019) Achieving efficient source camera identification on Hadoop. Multimed Tools Appl 78:32999–33021. https://doi.org/10.1007/s11042-019-7561-0
Chhabra GS, Singh VP, Singh M (2020) Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimed Tools Appl 79(23):15881–15900. https://doi.org/10.1007/s11042-018-6338-1
Demirbaga U, Wen Z, Noor A, Mitra K, Alwasel K, Garg S, Zomaya AY, Ranjan R (2021) AutoDiagn: an automated real-time diagnosis framework for big data systems. IEEE Trans Comput 71:1035–1048. https://doi.org/10.1109/TC.2021.3070639
Goyal S, Bhushan S, Kumar Y, Rana AHS, Bhutta MR, Ijaz MF, Son Y (2021) An optimized framework for energy-resource allocation in a cloud environment based on the whale optimization algorithm. Sensors 21(5):1583. https://doi.org/10.3390/s21051583
Gudditti V, Krishna PV (2021) Light weight encryption model for map reduce layer to preserve security in the big data and cloud. Mater Today: Proc. https://doi.org/10.1016/j.matpr.2021.01.190
Gupta C, Sinha R, Zhang Y (2015) Eagle: User profile-based anomaly detection for securing Hadoop clusters. 2015 IEEE international conference on big data (big data), IEEE. https://doi.org/10.1109/BigData.2015.7363892
Gupta M, Patwa F, Sandhu R (2018) An attribute-based access control model for secure big data processing in hadoop ecosystem. Proceedings of the Third ACM Workshop on Attribute-Based Access Control https://doi.org/10.1145/3180457.3180463
Gupta D, Rani S, Ahmed SH, Verma S, Ijaz MF, Shafi J (2021) Edge caching based on collaborative filtering for heterogeneous ICN-IoT applications. Sensors 21(16):5491. https://doi.org/10.3390/s21165491
Huh JH, Seo YS (2019) Understanding edge computing: engineering evolution with artificial intelligence. IEEE Access 7:164229–164245. https://doi.org/10.1109/ACCESS.2019.2945338
Hung PD, My KH (2021) The connection of IoT to big data–Hadoop ecosystem in a digital age. In: Emerging Technologies in Data Mining and Information Security. Springer, Singapore, pp 571–580. https://doi.org/10.1007/978-981-33-4367-2_54
Kapil G, Agrawal A, Attaallah A, Algarni A, Kumar R, Khan RA (2020) Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective. PeerJ Comput Sci 6:e259
Li Y, Zhang D (2020) Hadoop-Based University Ideological and Political Big Data Platform Design and Behavior Pattern Mining. In: 2020 International conference on advance in ambient computing and intelligence (ICAACI), IEEE, pp 47-51. https://doi.org/10.1109/ICAACI50733.2020.00014
Mahdi MS, Hassan NF, Abdul-Majeed GH (2021) An improved chacha algorithm for securing data on IoT devices. SN Appl Sci 3(4):1–9. https://doi.org/10.1007/s42452-021-04425-7
Mazhar Rathore M, Ahmad A, Paul A, Rho S (2018) Exploiting encrypted and tunneled multimedia calls in high-speed big data environment. Multimed Tools Appl 77:4959–4984. https://doi.org/10.1007/s11042-017-4393-7
Narayanan U, Paul V, Joseph S (2020) A novel system architecture for secure authentication and data sharing in cloud enabled big data environment. J King Saud Univ - Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.05.005
Nellutla R, Mohammed M (2020) Survey: a comparative study of different security issues in big data. In: Emerging Research in Data Engineering Systems and Computer Communications. Springer, Singapore, pp 247–257. https://doi.org/10.1007/978-981-15-0135-7_24
Ngu HCV, Huh JH (2019) B+-tree construction on massive data with Hadoop. Clust Comput 22(1):1011–1021. https://doi.org/10.1007/s10586-017-1183-y
Panarello A, Celesti A, Fazio M, Puliafito A, Villari M (2020) A big video data transcoding service for social media over federated clouds. Multimed Tools Appl 79:9037–9061. https://doi.org/10.1007/s11042-019-07786-9
Panigrahi R, Borah S, Bhoi AK, Ijaz MF, Pramanik M, Kumar Y, Jhaveri RH (2021) A consolidated decision tree-based intrusion detection system for binary and multiclass imbalanced datasets. Mathematics 9(7):751. https://doi.org/10.3390/math9070751
Panigrahi R, Borah S, Bhoi AK, Ijaz MF, Pramanik M, Jhaveri RH, Chowdhary CL (2021) Performance assessment of supervised classifiers for designing intrusion detection systems: a comprehensive review and recommendations for future research. Mathematics 9(6):690. https://doi.org/10.3390/math9060690
Panwar A, Bhatnagar V (2020) Scrutinize the idea of Hadoop-based data Lake for big data storage. In: Applications of machine learning. Springer, Singapore, pp 365–391
Parmar RR, Roy S, Bhattacharyya D, Bandyopadhyay SK, Kim TH (2017) Large-scale encryption in the Hadoop environment: challenges and solutions. IEEE Access 5:7156–7163. https://doi.org/10.1109/ACCESS.2017.2700228
Pradeep D, Sundar C (2020) QAOC: novel query analysis and ontology-based clustering for data management in Hadoop. Futur Gener Comput Syst 108:849–860. https://doi.org/10.1016/j.future.2020.03.010
Priyanka EB, Thangavel S, Meenakshipriya B et al (2021) Big data technologies with computational model computing using Hadoop with scheduling challenges. In: Deep Learning and Big Data for Intelligent Transportation: Enabling Technologies and Future Trends: 3. https://doi.org/10.1007/978-3-030-65661-4
Rahul MK, Uppunda PL, Vinayaka RS et al (2022) Simulation of self-driving cars using deep learning. In: Fundamentals and Methods of Machine and Deep Learning: Algorithms, Tools and Applications, pp 379–396. https://doi.org/10.1002/9781119821908.ch16
Rani S, Koundal D, Ijaz MF, Elhoseny M, Alghamdi MI (2021) An optimized framework for wsn routing in the context of industry 4.0. Sensors 21(19):6474. https://doi.org/10.3390/s21196474
Saritha G, Nagalakshmi V (2021) Support vector machine and feature selection based optimization framework for big data security. Int J Adv Res Eng Technol 11(11):326–337. https://ssrn.com/abstract=3800651
Seethalakshmi V, Govindasamy V, Akila V (2020) Hybrid gradient descent spider monkey optimization (HGDSMO) algorithm for efficient resource scheduling for big data processing in heterogenous environment. J Big Data 7(1):1–25. https://doi.org/10.1186/s40537-020-00321-w
Sharma A, Singh G (2019) A review of scheduling algorithms in Hadoop. Proc ICRIC 2020:125–135. https://doi.org/10.1007/978-3-030-29407-6_11
Shetty MM, Manjaiah DH, Hemdan EED (2019) Policy-based access control scheme for securing hadoop ecosystem. In: Data Management, Analytics and Innovation. Springer, Singapore, pp 167–176. https://doi.org/10.1007/978-981-13-1274-8_13
Tran DT, Huh JH (2022) Building a model to exploit association rules and analyze purchasing behavior based on rough set theory. J Supercomput 78(8):11051–11091. https://doi.org/10.1007/s11227-021-04275-5
Wang S (2020) Multimedia data compression storage of sensor network based on improved Huffman coding algorithm in cloud. Multimed Tools Appl 79:35369–35382. https://doi.org/10.1007/s11042-019-07765-0
Yu X, Ning P, Vouk MA (2014) Securing Hadoop in cloud. Proceedings of the 2014 symposium and Bootcamp on the science of security. https://doi.org/10.1145/2600176.2600202
Zagan E, Danubianu M (2020) Data Lake Approaches: A Survey. 2020 International conference on development and application systems (DAS), IEEE. https://doi.org/10.1109/DAS49615.2020.9108912
Acknowledgements
None.
Data availability statements
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Compliance with ethical standards
-
1.
Disclosure of Potential Conflict of Interest:
The authors declare that they have no potential conflict of interest.
-
2.
Statement of Human and Animal Rights.
Ethical approval
All applicable institutional and/or national guidelines for the care and use of animals were followed.
Informed consent
For this type of study formal consent is not required.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gattoju, S., Nagalakshmi, V. Design of ChaApache framework for securing Hadoop application in big data. Multimed Tools Appl 82, 15247–15269 (2023). https://doi.org/10.1007/s11042-022-13944-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13944-3