Skip to main content
Log in

Design of ChaApache framework for securing Hadoop application in big data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Hadoop is one of the biggest software structures for distributing the data to compute and handle big data. Big data is a group of composite and enormous datasets that contains a massive amount of data such as real-time data, social media, capabilities of data management, money laundering, and so on. Also, big data is measured as regards terabytes and petabytes. The main issue of the Hadoop application is unauthorized access. There are several existing techniques introduced to secure the data, but they have data errors, malicious attacks, and take a long time to compute. So the author proposed a novel ChaApache framework to secure the Hadoop application from an unauthorized person also to save processing time of data, and reduce the error rate. The main aim of the developed replica is securing data from an unauthorized person or unauthorized access. Moreover, the developed ChaApache framework is implemented in python, and the Hadoop application contains 512 bits of data, and the data are encrypted by four 32 bits. Furthermore, the proposed model is compared with other existing replicas in terms of computation time, resource usage, data sharing rate, encryption speed, and so on.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1:
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ali F, El-Sappagh S, Islam SMR, Ali A, Attique M, Imran M, Kwak KS (2021) An intelligent healthcare monitoring framework using wearable sensors and social networking data. Futur Gener Comput Syst 114:23–43. https://doi.org/10.1016/j.future.2020.07.047

    Article  Google Scholar 

  2. Awaysheh FM, Alazab M, Gupta M, Pena TF, Cabaleiro JC (2020) Next-generation big data federation access control: a reference model. Futur Gener Comput Syst 108:726–741. https://doi.org/10.1016/j.future.2020.02.052

    Article  Google Scholar 

  3. Awaysheh FM, Aladwan MN, Alazab M, Alawadi S, Cabaleiro JC, Pena TF (2021) Security by Design for big Data Frameworks over Cloud Computing. IEEE Trans Eng Manag:1–18. https://doi.org/10.1109/TEM.2020.3045661

  4. Bao K, Ding Y (2020) Network security analysis using big data technology and improved neural network. J Ambient Intell Humaniz Comput:1–11. https://doi.org/10.1007/s12652-020-02080-1

  5. Begum G, Huq SZU, Kumar APS (2020) Sandbox security model for Hadoop file system. J Big Data 7(1):1–10. https://doi.org/10.1186/s40537-020-00356-z

    Article  Google Scholar 

  6. Bide P, Padalkar A (2020) Survey on Diabetes Mellitus and incorporation of Big data, Machine Learning and IoT to mitigate it. 2020 6th international conference on advanced computing and communication systems (ICACCS), IEEE. https://doi.org/10.1109/ICACCS48705.2020.9074202

  7. Cattaneo G, Ferraro Petrillo U, Abate AF, Narducci F, Barra S (2019) Achieving efficient source camera identification on Hadoop. Multimed Tools Appl 78:32999–33021. https://doi.org/10.1007/s11042-019-7561-0

    Article  Google Scholar 

  8. Chhabra GS, Singh VP, Singh M (2020) Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimed Tools Appl 79(23):15881–15900. https://doi.org/10.1007/s11042-018-6338-1

    Article  Google Scholar 

  9. Demirbaga U, Wen Z, Noor A, Mitra K, Alwasel K, Garg S, Zomaya AY, Ranjan R (2021) AutoDiagn: an automated real-time diagnosis framework for big data systems. IEEE Trans Comput 71:1035–1048. https://doi.org/10.1109/TC.2021.3070639

    Article  MATH  Google Scholar 

  10. Goyal S, Bhushan S, Kumar Y, Rana AHS, Bhutta MR, Ijaz MF, Son Y (2021) An optimized framework for energy-resource allocation in a cloud environment based on the whale optimization algorithm. Sensors 21(5):1583. https://doi.org/10.3390/s21051583

    Article  Google Scholar 

  11. Gudditti V, Krishna PV (2021) Light weight encryption model for map reduce layer to preserve security in the big data and cloud. Mater Today: Proc. https://doi.org/10.1016/j.matpr.2021.01.190

  12. Gupta C, Sinha R, Zhang Y (2015) Eagle: User profile-based anomaly detection for securing Hadoop clusters. 2015 IEEE international conference on big data (big data), IEEE. https://doi.org/10.1109/BigData.2015.7363892

  13. Gupta M, Patwa F, Sandhu R (2018) An attribute-based access control model for secure big data processing in hadoop ecosystem. Proceedings of the Third ACM Workshop on Attribute-Based Access Control https://doi.org/10.1145/3180457.3180463

  14. Gupta D, Rani S, Ahmed SH, Verma S, Ijaz MF, Shafi J (2021) Edge caching based on collaborative filtering for heterogeneous ICN-IoT applications. Sensors 21(16):5491. https://doi.org/10.3390/s21165491

    Article  Google Scholar 

  15. Huh JH, Seo YS (2019) Understanding edge computing: engineering evolution with artificial intelligence. IEEE Access 7:164229–164245. https://doi.org/10.1109/ACCESS.2019.2945338

    Article  Google Scholar 

  16. Hung PD, My KH (2021) The connection of IoT to big data–Hadoop ecosystem in a digital age. In: Emerging Technologies in Data Mining and Information Security. Springer, Singapore, pp 571–580. https://doi.org/10.1007/978-981-33-4367-2_54

    Chapter  Google Scholar 

  17. Kapil G, Agrawal A, Attaallah A, Algarni A, Kumar R, Khan RA (2020) Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective. PeerJ Comput Sci 6:e259

    Article  Google Scholar 

  18. Li Y, Zhang D (2020) Hadoop-Based University Ideological and Political Big Data Platform Design and Behavior Pattern Mining. In: 2020 International conference on advance in ambient computing and intelligence (ICAACI), IEEE, pp 47-51. https://doi.org/10.1109/ICAACI50733.2020.00014

  19. Mahdi MS, Hassan NF, Abdul-Majeed GH (2021) An improved chacha algorithm for securing data on IoT devices. SN Appl Sci 3(4):1–9. https://doi.org/10.1007/s42452-021-04425-7

    Article  Google Scholar 

  20. Mazhar Rathore M, Ahmad A, Paul A, Rho S (2018) Exploiting encrypted and tunneled multimedia calls in high-speed big data environment. Multimed Tools Appl 77:4959–4984. https://doi.org/10.1007/s11042-017-4393-7

    Article  Google Scholar 

  21. Narayanan U, Paul V, Joseph S (2020) A novel system architecture for secure authentication and data sharing in cloud enabled big data environment. J King Saud Univ - Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.05.005

  22. Nellutla R, Mohammed M (2020) Survey: a comparative study of different security issues in big data. In: Emerging Research in Data Engineering Systems and Computer Communications. Springer, Singapore, pp 247–257. https://doi.org/10.1007/978-981-15-0135-7_24

    Chapter  Google Scholar 

  23. Ngu HCV, Huh JH (2019) B+-tree construction on massive data with Hadoop. Clust Comput 22(1):1011–1021. https://doi.org/10.1007/s10586-017-1183-y

    Article  Google Scholar 

  24. Panarello A, Celesti A, Fazio M, Puliafito A, Villari M (2020) A big video data transcoding service for social media over federated clouds. Multimed Tools Appl 79:9037–9061. https://doi.org/10.1007/s11042-019-07786-9

    Article  Google Scholar 

  25. Panigrahi R, Borah S, Bhoi AK, Ijaz MF, Pramanik M, Kumar Y, Jhaveri RH (2021) A consolidated decision tree-based intrusion detection system for binary and multiclass imbalanced datasets. Mathematics 9(7):751. https://doi.org/10.3390/math9070751

    Article  Google Scholar 

  26. Panigrahi R, Borah S, Bhoi AK, Ijaz MF, Pramanik M, Jhaveri RH, Chowdhary CL (2021) Performance assessment of supervised classifiers for designing intrusion detection systems: a comprehensive review and recommendations for future research. Mathematics 9(6):690. https://doi.org/10.3390/math9060690

    Article  Google Scholar 

  27. Panwar A, Bhatnagar V (2020) Scrutinize the idea of Hadoop-based data Lake for big data storage. In: Applications of machine learning. Springer, Singapore, pp 365–391

  28. Parmar RR, Roy S, Bhattacharyya D, Bandyopadhyay SK, Kim TH (2017) Large-scale encryption in the Hadoop environment: challenges and solutions. IEEE Access 5:7156–7163. https://doi.org/10.1109/ACCESS.2017.2700228

    Article  Google Scholar 

  29. Pradeep D, Sundar C (2020) QAOC: novel query analysis and ontology-based clustering for data management in Hadoop. Futur Gener Comput Syst 108:849–860. https://doi.org/10.1016/j.future.2020.03.010

    Article  Google Scholar 

  30. Priyanka EB, Thangavel S, Meenakshipriya B et al (2021) Big data technologies with computational model computing using Hadoop with scheduling challenges. In: Deep Learning and Big Data for Intelligent Transportation: Enabling Technologies and Future Trends: 3. https://doi.org/10.1007/978-3-030-65661-4

    Chapter  Google Scholar 

  31. Rahul MK, Uppunda PL, Vinayaka RS et al (2022) Simulation of self-driving cars using deep learning. In: Fundamentals and Methods of Machine and Deep Learning: Algorithms, Tools and Applications, pp 379–396. https://doi.org/10.1002/9781119821908.ch16

    Chapter  Google Scholar 

  32. Rani S, Koundal D, Ijaz MF, Elhoseny M, Alghamdi MI (2021) An optimized framework for wsn routing in the context of industry 4.0. Sensors 21(19):6474. https://doi.org/10.3390/s21196474

    Article  Google Scholar 

  33. Saritha G, Nagalakshmi V (2021) Support vector machine and feature selection based optimization framework for big data security. Int J Adv Res Eng Technol 11(11):326–337. https://ssrn.com/abstract=3800651

  34. Seethalakshmi V, Govindasamy V, Akila V (2020) Hybrid gradient descent spider monkey optimization (HGDSMO) algorithm for efficient resource scheduling for big data processing in heterogenous environment. J Big Data 7(1):1–25. https://doi.org/10.1186/s40537-020-00321-w

    Article  Google Scholar 

  35. Sharma A, Singh G (2019) A review of scheduling algorithms in Hadoop. Proc ICRIC 2020:125–135. https://doi.org/10.1007/978-3-030-29407-6_11

    Article  Google Scholar 

  36. Shetty MM, Manjaiah DH, Hemdan EED (2019) Policy-based access control scheme for securing hadoop ecosystem. In: Data Management, Analytics and Innovation. Springer, Singapore, pp 167–176. https://doi.org/10.1007/978-981-13-1274-8_13

    Chapter  Google Scholar 

  37. Tran DT, Huh JH (2022) Building a model to exploit association rules and analyze purchasing behavior based on rough set theory. J Supercomput 78(8):11051–11091. https://doi.org/10.1007/s11227-021-04275-5

    Article  Google Scholar 

  38. Wang S (2020) Multimedia data compression storage of sensor network based on improved Huffman coding algorithm in cloud. Multimed Tools Appl 79:35369–35382. https://doi.org/10.1007/s11042-019-07765-0

    Article  Google Scholar 

  39. Yu X, Ning P, Vouk MA (2014) Securing Hadoop in cloud. Proceedings of the 2014 symposium and Bootcamp on the science of security. https://doi.org/10.1145/2600176.2600202

  40. Zagan E, Danubianu M (2020) Data Lake Approaches: A Survey. 2020 International conference on development and application systems (DAS), IEEE. https://doi.org/10.1109/DAS49615.2020.9108912

Download references

Acknowledgements

None.

Data availability statements

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saritha Gattoju.

Ethics declarations

Compliance with ethical standards

  1. 1.

    Disclosure of Potential Conflict of Interest:

The authors declare that they have no potential conflict of interest.

  1. 2.

    Statement of Human and Animal Rights.

Ethical approval

All applicable institutional and/or national guidelines for the care and use of animals were followed.

Informed consent

For this type of study formal consent is not required.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gattoju, S., Nagalakshmi, V. Design of ChaApache framework for securing Hadoop application in big data. Multimed Tools Appl 82, 15247–15269 (2023). https://doi.org/10.1007/s11042-022-13944-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13944-3

Keywords

Navigation