Skip to main content

Computing Resource Prediction for MapReduce Applications Using Decision Tree

  • Conference paper
Web Technologies and Applications (APWeb 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Included in the following conference series:

Abstract

The cloud computing paradigm offer users access to computing resource in a pay-as-you-go manner. However, to both cloud computing vendors and users, it is a challenge to predict how much resource is needed to run an application in a cloud at a required level of quality. This research focuses on developing a model to predict the computing resource consumption of MapReduce applications in the cloud computing environment. Based on the Classified and Regression Tree (CART), the proposed approach derives knowledge of the relationship among the application features, quality of service, and amount of computing resource, from a small training. The experiments show that the prediction accuracy is as high as 80%. This research can potentially benefit both the cloud vendors and users through improving resource management and reducing costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albers, R., Suijs, E., de With, P.H.N.: Triple-C: Resource Usage Prediction for Semi-Automatic Parallelization of Groups of Dynamic Image-Processing Tasks. In: Proc. of the 23rd Int. Parallel Distributed Processing Symp. (2009)

    Google Scholar 

  2. Duan, R., Nadeem, F., Wang, J.: A Hybrid Intelligent Method for Performance Modeling and Prediction of Workflow Activities in Grids. In: Proc. of the 9th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), Shanghai, China, pp. 339–347 (May 2009)

    Google Scholar 

  3. Ganapathi, A., Chen, Y., Fox, A.: Statistics-Driven Workload Modeling for the Cloud. In: ICDE Workshops 2010, pp. 87–92 (2010)

    Google Scholar 

  4. Ganapathi, A., Kuno, H., Dayal, U., et al.: Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In: Proc. of the 2009 IEEE International Conference on Data Engineering, Shanghai, China, pp. 592–603 (March 2009)

    Google Scholar 

  5. Gibbons, R.: A Historical Application Profiler for Use by Parallel Schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  6. Kaashoek, F., Morris, R., Mao, Y.: Optimizing MapReduce for Multicore Architectures, technical report, http://dspace.mit.edu/bitstream/handle/1721.1/54692/MIT-CSAIL-TR-2010-020.pdf?sequence=1

  7. Matsunaga, A., Fortes, J.: On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications. In: Proc. of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), Melbourne Australia, pp. 495–504 (June 2010)

    Google Scholar 

  8. Mu’alem, A.W., Feitelson, D.G.: Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling. IEEE Transactions on Parallel and Distributed Systems 12(6) (June 2001)

    Google Scholar 

  9. Mitchell, T.M.: Machine Learning, McGraw-Hill Science/Engineering/Math (March 1, 1997)

    Google Scholar 

  10. Nadeem, F., Fahringer, T.: Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid. In: Proc. of the 9th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), Shanghai, China, pp. 316–323 (May 2009)

    Google Scholar 

  11. Guim, F., Rodero, I., Corbalan, J., et al.: The Grid Backfilling: a Multi-Site Scheduling Architecture with Data Mining Prediction Techniques. In: CoreGrid Workshop in Grid Middleware (2007)

    Google Scholar 

  12. Smith, W.: Prediction Services for Distributed Computing. In: Proc. of IEEE Internatioal Parallel and Distributed Processing Symposium, Long Beach, US, pp. 1–10 (June 2007)

    Google Scholar 

  13. Smith, W., Foster, I., Taylor, V.: Predicting Application Run Times Using Historical Information. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environment. In: OSDI 2008 Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (2008)

    Google Scholar 

  15. http://aws.amazon.com/elasticmapreduce/

  16. http://hadoop.apache.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Piao, J.T., Yan, J. (2012). Computing Resource Prediction for MapReduce Applications Using Decision Tree. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29253-8_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29252-1

  • Online ISBN: 978-3-642-29253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics