Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques

Pattanshetti, Tanuja; Attar, Vahida

doi:10.1007/978-981-15-5788-0_26

Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques

Tanuja Pattanshetti¹⁸ &
Vahida Attar¹⁸

Conference paper
First Online: 09 September 2020

809 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1176))

Abstract

Big data application performance can be optimized by identifying the most impactful set of system parameters of big data platforms. This paper focuses on the identification of optimal system parameter set of Hadoop and Spark data platforms by applying different feature selection techniques. The main objective of the research work is to reduce the job execution time by identifying and tuning only these identified system parameters. The parameters deemed to be less relevant and redundant get eliminated during the feature selection process. The parameters identified using different feature selection algorithms are compared, and empirical analysis is carried. The statistical analysis is used as a cross-validation technique to evaluate the relevance of the identified parameter set and the dependency of platform performance on system parameters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)
Article Google Scholar
Kamtekar, K., Jain R.: Performance Modeling of BigData—The Art of Computer Systems Performance Analysis: Techniquesfor Experimental Design, Measurement, Simulation, and Modeling. Wiley Interscience, New York. ISBN: 0471503363 (1991)
Google Scholar
Jagadish, H.V., Labrinidis, A.: Challenges and opportunities with big data. ACM 5(12), 2022–2023 (2012)
Google Scholar
Chen, X., Liang, Y., Li, G.R., Chen, C., Liu, S.Y.: Optimizing performance of Hadoop with parameter tuning. ITM Web of Conferences 12, 30–40 (2017)
Google Scholar
Hua, X., Huang, M.C., Liu, P.: Hadoop configuration tuning with ensemble modeling and metaheuristic optimization. IEEE Access 6, 44161–44174 (2018)
Article Google Scholar
Khaleel, A., Al-Raweshidy, H.: Optimization of computing and networking resources of a Hadoop cluster based on software defined network. IEEE Access 6, 61351–61365 (2018)
Article Google Scholar
Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning for mapreduce in a cloud. IEEE Trans. Parallel Distrib. Syst. 26(5), 1265–1279 (2015)
Article Google Scholar
Arauzo-Azofra, A., Benitez, J.M., Castro, J.L.: A feature set measure based on relief. In: Proceedings of the Fifth International Conference on Recent Advances in Soft Computing, pp. 104–109 (2004)
Google Scholar
Wang, G., Xu, J., He, B.: A novel method for tuning configuration parameters of spark based on machine learning. In: IEEE, 18th International Conference on High Performance Computing and Communications, pp. 586–593 (2016)
Google Scholar
Prasad, B.R, Agarwal, S.: Performance analysis and optimization of spark streaming applications through effective control parameters tuning. In: Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 99–110. Springer, Singapore (2018)
Google Scholar
Jamshidi, P., Casale, G.: An uncertainty-aware approach to optimal configuration of stream processing systems. In: IEEE, 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 39–48 (2016)
Google Scholar
Aldor-Noiman, S., Brown, L.D., Buja, A., Rolke, W., Stine, R.A.: The power to see: a new graphical test of normality. Am. Stat. 67(4), 249–260 (2013)
Article MathSciNet Google Scholar
Ghasemi, A., Zahediasl, S.: Normality tests for statistical analysis: a guide for non-statisticians. Int. J. Endocrinol. Metab. 10(2), 486 (2012)
Article Google Scholar
Razali, N.M., Wah, Y.B.: Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J. Stat. Model. Anal. 2(1), 21–33 (2011)
Google Scholar
Das, K.R., Imon, A.H.M.R.: A brief review of tests for normality. Am. J. Theor. Appl. Stat. 5(1), 5–12 (2016)
Article Google Scholar
Yap, B.W., Sim, C.H.: Comparisons of various types of normality tests. J. Stat. Comput. Simul. 81(12), 2141–2155 (2011)
Article MathSciNet Google Scholar
Petridis, P., Gounaris, A., Torres, J.: Spark parameter tuning via trial-and-error. In: INNS Conference on Big Data, pp. 226–237. Springer, Berlin (2016)
Google Scholar
Park, N.J., George, K.M., Park, N.: A multiple regression model for trend change prediction. In: International Conference on Financial Theory and Engineering, pp. 22–26. IEEE (2010)
Google Scholar
Feng, Q., Zhu, Q., Yuan, C., Lee, I.: Multi-linear regression coefficient classifier for recognition. In: IEEE Congress on Evolutionary Computation, pp. 1382–1387 (2016)
Google Scholar
Pattanshetti, T., Attar, V.: Unsupervised feature selection using correlation score. In: Computing, Communication and Signal Processing, pp. 355–362. Springer, Singapore (2019)
Google Scholar
Pattanshetti, T., Attar, V.: Mean Based Robust Multilinear Regression for Feature Selection (2019 Accepted)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Engineering Pune, Pune, Maharashtra, India
Tanuja Pattanshetti & Vahida Attar

Authors

Tanuja Pattanshetti
View author publications
You can also search for this author in PubMed Google Scholar
Vahida Attar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tanuja Pattanshetti .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan
Sheng-Lung Peng
School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Informatics, University of Leicester, Leicester, UK
Yu-Dong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pattanshetti, T., Attar, V. (2021). Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques. In: Bhateja, V., Peng, SL., Satapathy, S.C., Zhang, YD. (eds) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol 1176. Springer, Singapore. https://doi.org/10.1007/978-981-15-5788-0_26

Download citation

DOI: https://doi.org/10.1007/978-981-15-5788-0_26
Published: 09 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5787-3
Online ISBN: 978-981-15-5788-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics