Abstract
In this paper, we propose a novel approach to analyze and predict failures in Hadoop cluster. We enumerate several key challenges that hinder failure prediction in such systems: heterogeneity of the system, hidden complexity, time limitation and scalability. At first, clustering approach is applied to group similar error sequences, which makes training of the model effectual subsequently Hidden Markov Models (HMMs) is used to predict failure, using the MapReduce programming framework. The effectiveness of the failure prediction algorithm is measured by precision, recall and accuracy metrics. Our algorithm can predict failure with an accuracy of \(91\,\%\) with 2 days in advance using \(87\,\%\) of data as training sets. Although the model presented in this paper focuses on Hadoop clusters, the model can be generalized in other cloud computing frameworks as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache: Apache flume (2010). https://flume.apache.org/FlumeUserGuide.html
Baum, L.E., Eagon, J., et al.: An inequality with applications to statistical estimation for probabilistic functions of markov processes and to a model for ecology. Bull. Am. Math. Soc. 73(3), 360–363 (1967)
Box, G.E., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control. Wiley, New York (2013)
Chang, H., Kodialam, M., Kompella, R.R., Lakshman, T., Lee, M., Mukherjee, S.: Scheduling in mapreduce-like systems for fast completion time. In: 2011 Proceedings IEEE INFOCOM, pp. 3074–3082. IEEE (2011)
Daidone, A., Di Giandomenico, F., Bondavalli, A., Chiaradonna, S.: Hidden markov models as a support for diagnosis: formalization of the problem and synthesis of the solution. In: 25th IEEE Symposium on Reliable Distributed Systems, SRDS 2006, pp. 245–256. IEEE (2006)
David: anarchyape (2013). https://github.com/david78k/anarchyape
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004). http://dl.acm.org/citation.cfm?id=1251254.1251264
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)
Durbin, R.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Faghri, F., Bazarbayev, S., Overholt, M., Farivar, R., Campbell, R.H., Sanders, W.H.: Failure scenario as a service (fsaas) for hadoop clusters. In: Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management, p. 5. ACM (2012)
Fahad, A., Alshatri, N., Tari, Z., ALAmri, A., Y Zomaya, A., Khalil, I., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy & empirical analysis (2014)
Fonseca, R.: X-trace (2010). https://github.com/rfonseca/X-Trace
Fulp, E.W., Fink, G.A., Haack, J.N.: Predicting computer system failures using support vector machines. In: Proceedings of the First USENIX Conference on Analysis of System Logs, WASL 2008, p. 5. USENIX Association, Berkeley (2008). http://dl.acm.org/citation.cfm?id=1855886.1855891
Hassan, M.R., Nath, B., Kirley, M.: A fusion model of hmm, ann and ga for stock market forecasting. Expert Syst. Appl. 33(1), 171–180 (2007)
Huang, X., Acero, A., Hon, H.W., Foreword By-Reddy, R.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002). http://dx.doi.org/10.1109/TPAMI.2002.1017616
Konwinski, A., Zaharia, M., Katz, R., Stoica, I.: X-tracing hadoop (2008)
Liang, Y., Zhang, Y., Sivasubramaniam, A., Sahoo, R.K., Moreira, J., Gupta, M.: Filtering failure logs for a bluegene/l prototype. In: Proceedings of the International Conference on Dependable Systems and Networks, DSN 2005, pp. 476–485. IEEE (2005)
de Botelho Marcos, P.: Maresia: an approach to deal with the single points of failure of the mapreduce model (2013)
Mccreadie, R., Macdonald, C., Ounis, I.: Mapreduce indexing strategies: studying scalability and efficiency. Inf. Process. Manage. 48(5), 873–888 (2012). http://dx.doi.org/10.1016/j.ipm.2010.12.003
Ng, F.: Analysis of hadoops performance under failures. Rice University
Plötz, T., Fink, G.A.: Markov Models for Handwriting Recognition. Springer, Heidelberg (2011)
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Sahoo, R.K., Sivasubramaniam, A., Squillante, M.S., Zhang, Y.: Failure data analysis of a large-scale heterogeneous server environment. In: 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), p. 772 (2004)
Salfner, F., Malek, M.: Using hidden semi-markov models for effective online failure prediction. In: 26th IEEE International Symposium on Reliable Distributed Systems, SRDS 2007, pp. 161–174. IEEE (2007)
SWIMProjectUCB: Swimprojectucb/swim (2012). https://github.com/SWIMProjectUCB/SWIM
Tai, A.H., Ching, W.K., Chan, L.Y.: Detection of machine failure: hidden markov model approach. Comput. Ind. Eng. 57(2), 608–619 (2009)
Tan, J., Pan, X., Kavulya, S., Gandhi, R., Narasimhan, P.: Salsa: analyzing logs as state machines. WASL 8, 6–6 (2008)
Teoh, T.T., Cho, S.Y., Nguwi, Y.Y.: Hidden markov model for hard-drive failure detection. In: 2012 7th International Conference on Computer Science & Education (ICCSE), pp. 3–8. IEEE (2012)
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)
Wang, F., Qiu, J., Yang, J., Dong, B., Li, X., Li, Y.: Hadoop high availability through metadata replication. In: Proceedings of the First International Workshop on Cloud Data Management, pp. 37–44. ACM (2009)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2012)
Wilson, A.D., Bobick, A.F.: Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), 884–900 (1999)
Zawawy, H., Kontogiannis, K., Mylopoulos, J.: Log filtering and interpretation for root cause analysis. In: ICSM, pp. 1–5. IEEE Computer Society (2010). http://dblp.uni-trier.de/db/conf/icsm/icsm2010.html#ZawawyKM10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Agrawal, B., Wiktorski, T., Rong, C. (2015). Analyzing and Predicting Failure in Hadoop Clusters Using Distributed Hidden Markov Model. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-28430-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28429-3
Online ISBN: 978-3-319-28430-9
eBook Packages: Computer ScienceComputer Science (R0)