Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest

Dheenadayalan, Kumar; Srinivasaraghavan, Gopalakrishnan; Muralidhara, V. N.

doi:10.1007/978-3-319-57529-2_39

Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest

Kumar Dheenadayalan¹⁹,
Gopalakrishnan Srinivasaraghavan¹⁹ &
V. N. Muralidhara¹⁹

Conference paper
First Online: 23 April 2017

2939 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Abstract

The holy-grail of large complex storage systems in enterprises today is for these systems to be self-governing. We propose a self-tuning scheme for large storage filers, on which very little work has been done in the past. Our system uses the performance counters generated by a filer to assess its health in real-time and modify the workload and/or tune the system parameters for optimizing the operational metrics. We use a Pruned Random Forest based solution to predict overload in real-time — the model is run on every snapshot of counter values. Large number of trees in a random forest model has an immediate adverse effect on the time to take a decision. A large random forest is therefore not viable in a real-time scenario. Our solution uses a pruned random forest that performs as well as the original forest. A saliency analysis is carried out to identify components of the system that require tuning in case an overload situation is predicted. This allows us to initiate some ‘action’ on the bottleneck components. The ‘action’ we have explored in our experiments is ‘throttling’ the bottleneck component to prevent overload situations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Almuallim, H., Dietterich, T.G.: Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 69(1–2), 279–305 (1994)
Article MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Contributions, M.K.: caret: Classification and Regression Training, r package version 5.15-044 (2012)
Google Scholar
Dheenadayalan, K., Muralidhara, V.N., Datla, P., Srinivasaraghavan, G., Shah, M.: Premonition of storage response class using skyline ranked ensemble method. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10, December 2014
Google Scholar
Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N.: Pruning a random forest by learning a learning algorithm. MLDM 2016. LNCS (LNAI), vol. 9729, pp. 516–529. Springer, Cham (2016). doi:10.1007/978-3-319-41920-6_41
Chapter Google Scholar
Fawagreh, K., Gaber, M.M., Elyan, E.: On extreme pruning of random forest ensembles for real-time predictive applications. CoRR abs/1503.04996 (2015)
Google Scholar
Ganapathi, A.S.: Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2009
Google Scholar
Ganger, G.R., Strunk, J.D., Klosterman, A.J.: Self-*storage: Brick-based storage with automated administration. Technical report, Carnegie Mellon University, School of Computer Science, Technical report (2003)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM (2003)
Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives, pp. 202–209. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article MATH Google Scholar
Lee, E.K.: Performance Modeling and Analysis of Disk Arrays. Ph.D. thesis, EECS Department, University of California, Berkeley, August 1993
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
Martinez-Munoz, G., Hernandez-Lobato, D., Suarez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 245–259 (2009)
Article Google Scholar
Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)
MathSciNet MATH Google Scholar
NetApp Inc.: Managing workload performance by using storage qos. https://library.netapp.com/ecmdocs/ECMP1196798/html/GUID-660A6C00-6D7E-4EE5-B97E-9D33C0B706B5.html
Opitz, D.W.: Feature selection for ensembles. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 379–384. American Association for Artificial Intelligence (1999)
Google Scholar
Pollack, K.T., Uttamchandani, S.M.: Genesis: a scalable self-evolving performance management framework for storage systems. In: 26th IEEE International Conference on Distributed Computing Systems, p. 33 (2006)
Google Scholar
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Schwing, A.G., Zach, C., Zheng, Y., Pollefeys, M.: Adaptive random forest - how many “experts” to ask before making a decision? In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1377–1384. IEEE Computer Society (2011)
Google Scholar
Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000). doi:10.1007/3-540-45164-1_41
Chapter Google Scholar
Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 52. IEEE Computer Society (2004)
Google Scholar
Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03999-7_1
Chapter Google Scholar
Zhu, Y., Jiang, H., Wang, J., Xian, F.: Hba: distributed metadata management for large cluster based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)
Article Google Scholar

Download references

Acknowledgments

This research work was partially funded by NetApp Inc. The views and conclusions contained herein are those of the authors only.

Author information

Authors and Affiliations

International Institute of Information Technology, Bangalore, India
Kumar Dheenadayalan, Gopalakrishnan Srinivasaraghavan & V. N. Muralidhara

Authors

Kumar Dheenadayalan
View author publications
You can also search for this author in PubMed Google Scholar
Gopalakrishnan Srinivasaraghavan
View author publications
You can also search for this author in PubMed Google Scholar
V. N. Muralidhara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumar Dheenadayalan .

Editor information

Editors and Affiliations

Kangwon National University, Chuncheon, Korea (Republic of)
Jinho Kim
Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
University of New South Wales, Sydney, New South Wales, Australia
Xuemin Lin
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N. (2017). Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-57529-2_39
Published: 23 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics