Skip to main content

Performance Prediction of Spark Based on the Multiple Linear Regression Analysis

  • Conference paper
  • First Online:
Book cover Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

Abstract

It is crucial to evaluate performance of a cloud platform and determine the main factors influencing the property. Moreover, the analysis results of related performance indicators can be applied to making theoretical predictions about the performance status of the cloud platform. This work mainly focuses on researching the interrelations between the performance indicators based on the Spark technology of the cloud platform and the load performance of the cluster, and furthermore makes effective predictions for the load performance. Firstly, we put forward the analytic frameworks of Spark performance analysis, the specific indicators analysis as well as the prediction models towards the cluster load. Secondly, with respect to the evaluation indicators, we explore the basis for their selections as well as their concrete implications, and then objectively, accurately calculate the correlation formula between the practically produced performance parameters and the load performance of the cluster when the Spark cluster performs the batch applications utilizing the MLR (Multiple Linear Regression) method, and, therefore, determine the main factors impacting the load performance. Finally, we predict the load value utilizing the Spark indicator analysis and the load prediction model. The results indicate that accuracy is up to 92.307%. Consequently, the solution presented in this paper predicts the cluster load value with effetioncy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mesbahi, M.R., Hashemi, M., Rahmani, A.M.: Performance evaluation and analysis of load balancing algorithms in cloud computing environments. In: Second International Conference on Web Research, pp. 145–151. IEEE (2016)

    Google Scholar 

  2. Li, M., Tan, J., Wang, Y., et al.: SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: ACM International Conference on Computing Frontiers, pp. 1–8. ACM (2015)

    Google Scholar 

  3. Mershad, K., Artail, H., Saghir, M., et al.: A mathematical model to analyze the utilization of a cloud datacenter middleware. J. Netw. Comput. Appl. 59(3), 399–415 (2014)

    Google Scholar 

  4. Gu, L., Li, H.: Memory or time: performance evaluation for iterative operation on Hadoop and Spark. In: IEEE International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp. 721–727. (2013)

    Google Scholar 

  5. Villalpando, L.E.B., April, A., Abran, A.: Methodology to determine relationships between performance factors in hadoop cloud computing applications. In: International Conference on Cloud Computing and Services Sciences, pp. 375–386. (2014)

    Google Scholar 

  6. Sha, L., Ding, J., Chen, X., et al.: Performance modeling of openstack cloud computing platform using performance evaluation process algebra. In: International Conference on Cloud Computing and Big Data, pp. 49–56. IEEE (2015)

    Google Scholar 

  7. Expósito, R.R., Taboada, G.L., Ramos, S., et al.: Evaluation of messaging middleware for high-performance cloud computing. Pers. Ubiquit. Comput. 17(8), 1709–1719 (2013)

    Article  Google Scholar 

  8. Grandhi, S., Wibowo, S.: Performance evaluation of cloud computing providers using fuzzy multiattribute group decision making model. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 130–135. IEEE (2015)

    Google Scholar 

  9. Villalpando, L.E.B., April, A., Abran, A.: Performance analysis model for big data applications in cloud computing. J. Cloud Comput. 3(1), 1–20 (2014)

    Article  Google Scholar 

  10. Prieto, M., Tanner, P., Andrade, C.: Multiple linear regression model for the assessment of bond strength in corroded and non-corroded steel bars in structural concrete. Mater. Struct. 49(11), 4749–4763 (2016)

    Article  Google Scholar 

  11. Pavón-Domínguez, P., Jiménez-Hornero, F.J., Ravé, E.G.D.: Evaluation of the temporal scaling variability in forecasting ground-level ozone concentrations obtained from multiple linear regressions. Env. Monit. Assess. 185(5), 3853–3866 (2013)

    Article  Google Scholar 

  12. Khedher, O., Jarraya, M.: Performance evaluation and improvement in cloud computing environment. In: International Conference on High Performance Computing and Simulation, pp. 650–652. IEEE (2015)

    Google Scholar 

  13. Ataş, G., Gungor, V.C.: Performance evaluation of cloud computing platforms using statistical methods. Comput. Electr. Eng. 40(5), 1636–1649 (2014)

    Article  Google Scholar 

  14. Gong, L., Xie, J., Li, X., et al.: Study on energy saving strategy and evaluation method of green cloud computing system. In: IEEE, Conference on Industrial Electronics and Applications, pp. 483–488. IEEE (2013)

    Google Scholar 

  15. Goga, K., Terzo, O., Ruiu, P., et al.: Simulation, modeling, and performance evaluation tools for cloud applications. In: Eighth International Conference on Complex, Intelligent and Software Intensive Systems, pp. 226–232. IEEE (2014)

    Google Scholar 

  16. Li, L., Rong, M., Zhang, G.: An internet of things QoE evaluation method based on multiple linear regression analysis. In: International Conference on Computer Science and Education, pp. 925–928. IEEE (2015)

    Google Scholar 

Download references

Acknowledgments

The subject is sponsored by the National Natural Science Foundation of P. R. China (Nos. 61373017, 61572260, 61572261, 61672296, 61602261), the Natural Science Foundation of Jiangsu Province (Nos. BK20140886, BK20140888, BK20160089), Scientific & Technological Support Project of Jiangsu Province (Nos. BE2015702, BE2016777, BE2016185), China Postdoctoral Science Foundation (Nos. 2014M551636, 2014M561696), Jiangsu Planned Projects for Postdoctoral Research Funds (Nos. 1302090B, 1401005B), Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks Foundation (No. WSNLBZY201508), Research Innovation Program for College Graduates of Jiangsu Province (SJZZ16_0148).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Dong, L., Li, P., Xu, H., Luo, B., Mi, Y. (2017). Performance Prediction of Spark Based on the Multiple Linear Regression Analysis. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics