Skip to main content

Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Abstract

The holy-grail of large complex storage systems in enterprises today is for these systems to be self-governing. We propose a self-tuning scheme for large storage filers, on which very little work has been done in the past. Our system uses the performance counters generated by a filer to assess its health in real-time and modify the workload and/or tune the system parameters for optimizing the operational metrics. We use a Pruned Random Forest based solution to predict overload in real-time — the model is run on every snapshot of counter values. Large number of trees in a random forest model has an immediate adverse effect on the time to take a decision. A large random forest is therefore not viable in a real-time scenario. Our solution uses a pruned random forest that performs as well as the original forest. A saliency analysis is carried out to identify components of the system that require tuning in case an overload situation is predicted. This allows us to initiate some ‘action’ on the bottleneck components. The ‘action’ we have explored in our experiments is ‘throttling’ the bottleneck component to prevent overload situations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Almuallim, H., Dietterich, T.G.: Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 69(1–2), 279–305 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Contributions, M.K.: caret: Classification and Regression Training, r package version 5.15-044 (2012)

    Google Scholar 

  4. Dheenadayalan, K., Muralidhara, V.N., Datla, P., Srinivasaraghavan, G., Shah, M.: Premonition of storage response class using skyline ranked ensemble method. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10, December 2014

    Google Scholar 

  5. Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N.: Pruning a random forest by learning a learning algorithm. MLDM 2016. LNCS (LNAI), vol. 9729, pp. 516–529. Springer, Cham (2016). doi:10.1007/978-3-319-41920-6_41

    Chapter  Google Scholar 

  6. Fawagreh, K., Gaber, M.M., Elyan, E.: On extreme pruning of random forest ensembles for real-time predictive applications. CoRR abs/1503.04996 (2015)

    Google Scholar 

  7. Ganapathi, A.S.: Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2009

    Google Scholar 

  8. Ganger, G.R., Strunk, J.D., Klosterman, A.J.: Self-*storage: Brick-based storage with automated administration. Technical report, Carnegie Mellon University, School of Computer Science, Technical report (2003)

    Google Scholar 

  9. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM (2003)

    Google Scholar 

  10. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  11. Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives, pp. 202–209. Morgan Kaufmann Publishers Inc. (2001)

    Google Scholar 

  12. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  MATH  Google Scholar 

  13. Lee, E.K.: Performance Modeling and Analysis of Disk Arrays. Ph.D. thesis, EECS Department, University of California, Berkeley, August 1993

    Google Scholar 

  14. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)

    Google Scholar 

  15. Martinez-Munoz, G., Hernandez-Lobato, D., Suarez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 245–259 (2009)

    Article  Google Scholar 

  16. Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)

    MathSciNet  MATH  Google Scholar 

  17. NetApp Inc.: Managing workload performance by using storage qos. https://library.netapp.com/ecmdocs/ECMP1196798/html/GUID-660A6C00-6D7E-4EE5-B97E-9D33C0B706B5.html

  18. Opitz, D.W.: Feature selection for ensembles. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 379–384. American Association for Artificial Intelligence (1999)

    Google Scholar 

  19. Pollack, K.T., Uttamchandani, S.M.: Genesis: a scalable self-evolving performance management framework for storage systems. In: 26th IEEE International Conference on Distributed Computing Systems, p. 33 (2006)

    Google Scholar 

  20. Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  21. Schwing, A.G., Zach, C., Zheng, Y., Pollefeys, M.: Adaptive random forest - how many “experts” to ask before making a decision? In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1377–1384. IEEE Computer Society (2011)

    Google Scholar 

  22. Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000). doi:10.1007/3-540-45164-1_41

    Chapter  Google Scholar 

  23. Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 52. IEEE Computer Society (2004)

    Google Scholar 

  24. Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03999-7_1

    Chapter  Google Scholar 

  25. Zhu, Y., Jiang, H., Wang, J., Xian, F.: Hba: distributed metadata management for large cluster based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This research work was partially funded by NetApp Inc. The views and conclusions contained herein are those of the authors only.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumar Dheenadayalan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N. (2017). Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57529-2_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57528-5

  • Online ISBN: 978-3-319-57529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics