Skip to main content

Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques

  • Conference paper
  • First Online:
  • 809 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1176))

Abstract

Big data application performance can be optimized by identifying the most impactful set of system parameters of big data platforms. This paper focuses on the identification of optimal system parameter set of Hadoop and Spark data platforms by applying different feature selection techniques. The main objective of the research work is to reduce the job execution time by identifying and tuning only these identified system parameters. The parameters deemed to be less relevant and redundant get eliminated during the feature selection process. The parameters identified using different feature selection algorithms are compared, and empirical analysis is carried. The statistical analysis is used as a cross-validation technique to evaluate the relevance of the identified parameter set and the dependency of platform performance on system parameters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)

    Article  Google Scholar 

  2. Kamtekar, K., Jain R.: Performance Modeling of BigData—The Art of Computer Systems Performance Analysis: Techniquesfor Experimental Design, Measurement, Simulation, and Modeling. Wiley Interscience, New York. ISBN: 0471503363 (1991)

    Google Scholar 

  3. Jagadish, H.V., Labrinidis, A.: Challenges and opportunities with big data. ACM 5(12), 2022–2023 (2012)

    Google Scholar 

  4. Chen, X., Liang, Y., Li, G.R., Chen, C., Liu, S.Y.: Optimizing performance of Hadoop with parameter tuning. ITM Web of Conferences 12, 30–40 (2017)

    Google Scholar 

  5. Hua, X., Huang, M.C., Liu, P.: Hadoop configuration tuning with ensemble modeling and metaheuristic optimization. IEEE Access 6, 44161–44174 (2018)

    Article  Google Scholar 

  6. Khaleel, A., Al-Raweshidy, H.: Optimization of computing and networking resources of a Hadoop cluster based on software defined network. IEEE Access 6, 61351–61365 (2018)

    Article  Google Scholar 

  7. Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning for mapreduce in a cloud. IEEE Trans. Parallel Distrib. Syst. 26(5), 1265–1279 (2015)

    Article  Google Scholar 

  8. Arauzo-Azofra, A., Benitez, J.M., Castro, J.L.: A feature set measure based on relief. In: Proceedings of the Fifth International Conference on Recent Advances in Soft Computing, pp. 104–109 (2004)

    Google Scholar 

  9. Wang, G., Xu, J., He, B.: A novel method for tuning configuration parameters of spark based on machine learning. In: IEEE, 18th International Conference on High Performance Computing and Communications, pp. 586–593 (2016)

    Google Scholar 

  10. Prasad, B.R, Agarwal, S.: Performance analysis and optimization of spark streaming applications through effective control parameters tuning. In: Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 99–110. Springer, Singapore (2018)

    Google Scholar 

  11. Jamshidi, P., Casale, G.: An uncertainty-aware approach to optimal configuration of stream processing systems. In: IEEE, 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 39–48 (2016)

    Google Scholar 

  12. Aldor-Noiman, S., Brown, L.D., Buja, A., Rolke, W., Stine, R.A.: The power to see: a new graphical test of normality. Am. Stat. 67(4), 249–260 (2013)

    Article  MathSciNet  Google Scholar 

  13. Ghasemi, A., Zahediasl, S.: Normality tests for statistical analysis: a guide for non-statisticians. Int. J. Endocrinol. Metab. 10(2), 486 (2012)

    Article  Google Scholar 

  14. Razali, N.M., Wah, Y.B.: Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J. Stat. Model. Anal. 2(1), 21–33 (2011)

    Google Scholar 

  15. Das, K.R., Imon, A.H.M.R.: A brief review of tests for normality. Am. J. Theor. Appl. Stat. 5(1), 5–12 (2016)

    Article  Google Scholar 

  16. Yap, B.W., Sim, C.H.: Comparisons of various types of normality tests. J. Stat. Comput. Simul. 81(12), 2141–2155 (2011)

    Article  MathSciNet  Google Scholar 

  17. Petridis, P., Gounaris, A., Torres, J.: Spark parameter tuning via trial-and-error. In: INNS Conference on Big Data, pp. 226–237. Springer, Berlin (2016)

    Google Scholar 

  18. Park, N.J., George, K.M., Park, N.: A multiple regression model for trend change prediction. In: International Conference on Financial Theory and Engineering, pp. 22–26. IEEE (2010)

    Google Scholar 

  19. Feng, Q., Zhu, Q., Yuan, C., Lee, I.: Multi-linear regression coefficient classifier for recognition. In: IEEE Congress on Evolutionary Computation, pp. 1382–1387 (2016)

    Google Scholar 

  20. Pattanshetti, T., Attar, V.: Unsupervised feature selection using correlation score. In: Computing, Communication and Signal Processing, pp. 355–362. Springer, Singapore (2019)

    Google Scholar 

  21. Pattanshetti, T., Attar, V.: Mean Based Robust Multilinear Regression for Feature Selection (2019 Accepted)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanuja Pattanshetti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pattanshetti, T., Attar, V. (2021). Performance Optimization of Big Data Applications Using Parameter Tuning of Data Platform Features Through Feature Selection Techniques. In: Bhateja, V., Peng, SL., Satapathy, S.C., Zhang, YD. (eds) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol 1176. Springer, Singapore. https://doi.org/10.1007/978-981-15-5788-0_26

Download citation

Publish with us

Policies and ethics