Abstract
With a variety of providers large and small delivering a number of cloud-based services, cloud computing is evolving into an important service delivery infrastructure. One of the challenges in this evolution is how to provide necessary fault handling for migration long-running or computationally-intensive application services into shared open cloud infrastructures. To minimize failure impact on services and application executions, we present a diagnostic architecture and a diagnosis method based on the service dependence graph (SDG) model and the service execution log for handling service faults. By decoupling diagnosis service components and sharing diagnosis resources, the scalability of diagnosis methods is improved by incorporating third-party diagnostic components into our architecture. By analyzing the dependence relations of activities in SDG model, our diagnosis method identifies the incorrect activities, and explains the root causes for the web service composition faults, based on the differences between successful and failed executions of composite service. Experimental results show that our method is effective in diagnosing faults in web service composition of various scales.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Abreu, R., van Gemund, A.J.C.: A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In: Symposium on Abstraction, Reformulation and Approximation, Lake Arrowhead, CA, USA, July (2009)
Abreu, R., Zoeteweij, P., Golsteijn, R., van Gemund, A.J.C.: A practical evaluation of spectrum-based fault localization. J. Syst. Softw. 82(11), 1780–1792 (2009). doi:10.1016/j.jss.2009.06.035
Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: On the accuracy of spectrum-based fault localization. In: Testing: Academic and Industrial Conference Practice and Research Techniques (TAICPART-MUTATION 2007), Windsor, United Kingdom, 10–14 September (2007)
Ait-Bachir, A., Fauvet, M.-C.: Diagnosing and measuring incompatibilities between pairs of services. In: Proceedings of the 20th International Conference on Database and Expert Systems Applications (DEXA’09), Linz, Austria, 31 August–4 September (2009)
Al-Masri, E., Mahmoud, Q.H.: Investigating web services on the world wide web. In: Proceedings of the 17th International Conference on World Wide Web, Beijing, China, 21–25 April (2008)
Alodib, M., Bordbar, B.: A model-based approach to fault diagnosis in service oriented architectures. In: 7th IEEE European Conference on Web Services (ECOWS’09), Eindhoven, The Netherlands, 9–11 November (2009)
Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.: Business Process Execution Language for Web Services (2003)
Ardissono, L., Console, L., Goy, A., Petrone, G., Picardi, C., Segnan, M., Dupre, D.T.: Enhancing web services with diagnostic capabilities. In: Proceedings of the Third European Conference on Web Services (ECOWS’05), Växjö, Sweden, 14–16 November (2005)
Ardissono, L., Furnari, R., Goy, A., Petrone, G., Segnan, M.: Fault tolerant web service orchestration by means of diagnosis. In: Proceedings of the Third European Conference on Software Architecture (EWSA’06), Nantes, France, 4–5 September (2006)
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). doi:10.1145/1721654.1721672
Baah, G.K., Podgurski, A., Harrold, M.J.: The probabilistic program dependence graph and its application to fault diagnosis. IEEE Trans. Softw. Eng. 36(4), 528–545 (2010). doi:10.1109/tse.2009.87
Chiang, M.-L.: Efficient diagnosis protocol to enhance the reliability of a cloud computing environment. J. Netw. Syst. Manag. 20(4), 579–600 (2012). doi:10.1007/s10922-012-9247-z
Dai, Y., Yang, L., Zhang, B., Zhu, Z.: Exception diagnosis for composite service based on error propagation degree. In: 2011 IEEE International Conference on Services Computing (SCC 2011), Washington, DC, USA, 4–9 July (2011)
Du, J., Wei, W., Gu, X., Yu, T.: RunTest: assuring integrity of dataflow processing in cloud computing infrastructures. In: Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (ASIACCS), Beijing, China, 13–16 April (2010)
Duan, S., Zhang, H., Jiang, G., Meng, X.: Supporting system-wide similarity queries for networked system management. In: Proceedings of the 2010 IEEE-IFIP Network Operations and Management Symposium (NOMS), Osaka, 19–23 April (2010)
Friedrich, G., Fugini, M., Mussi, E., Pernici, B., Tagni, G.: Exception handling for repair in service-based processes. IEEE Trans. Softw. Eng. 36(2), 198–215 (2010). doi:10.1109/tse.2010.8
Höfer, C.N., Karagiannis, G.: Cloud computing services: taxonomy and comparison. J. Internet Serv. Appl. 2(2), 81–94 (2011). doi:10.1007/s13174-011-0027-x
Hamadi, R., Benatallah, B., Medjahed, B.: Self-adapting recovery nets for policy-driven exception handling in business processes. Distrib. Parallel Databases 23(1), 1–44 (2008). doi:10.1007/s10619-007-7020-1
Han, X., Shi, Z., Niu, W., Chen, K., Yang, X.: Similarity-based Bayesian learning from semi-structured log files for fault diagnosis of web services. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Toronto, Canada, 31 August–3 September (2010)
Hui, S.C., Fong, A.C.M., Jha, G.: A web-based intelligent fault diagnosis system for customer service support. Eng. Appl. Artif. Intell. 14(4), 537–548 (2001). doi:10.1016/S0952-1976(01)00018-5
Jhawar, R., Piuri, V.: Fault tolerance management in IaaS clouds. In: IEEE 1st AESS European Conference on Satellite Telecommunications (ESTEL 2012), Rome, Italy, 2–5 October (2012)
Jones, J.A., Harrold, M.J.: Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, CA, USA, 7–11 November (2005)
Juhnke, E., Dornemann, T., Freisleben, B.: Fault-tolerant BPEL workflow execution via cloud-aware recovery policies. In: 35th Euromicro Conference on Software Engineering and Advanced Applications (SEAA’09), Patras, 27–29 August (2009)
Kang, H., Chen, H., Jiang, G.P.: A fault detection and diagnosis tool for virtualized consolidation systems. In: Proceedings of the 7th International Conference on Autonomic Computing, Washington, DC, USA, 7–11 June (2010)
Kemper, P., Tepper, C.: Automated trace analysis of discrete-event system models. IEEE Trans. Softw. Eng. 35(2), 195–208 (2009). doi:10.1109/TSE.2008.75
Kopp, O., Leymann, F., Wutke, D.: Fault handling in the web service stack. Serv. Oriented Comput. Appl. 6470, 303–317 (2010). doi:10.1007/978-3-642-17358-5_21
Lakshmi, H.N., Mohanty, H.: Automata for web services fault monitoring and diagnosis. Int. J. Comput. Commun. Technol. 3(2), 13–18 (2011)
Li, Y., Ye, L., Dague, P., Melliti, T.: A decentralized model-based diagnosis for BPEL services. In: Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI’09), Newark, NJ, 2–4 November 2009
Liu, A., Li, Q., Huang, L.S., Xiao, M.J.: FACTS: a framework for fault-tolerant composition of transactional web services. IEEE Trans. Serv. Comput. 3(1), 46–59 (2010). doi:10.1109/tsc.2009.28
Mayer, W., Friedrich, G., Stumptner, M.: Diagnosis of service failures by trace analysis with partial knowledge. Serv. Oriented Comput. Appl. 6470, 334–349 (2010). doi:10.1007/978-3-642-17358-5_23
Mell, P., Grance, T.: The NIST Definition of Cloud Computing. National Institute of Standards and Technology, New York (2011)
Mi, H., Wang, H., Yin, G., Cai, H., Zhou, Q., Sun, T.: In: Performance problems diagnosis in cloud computing systems by mining request Trace logs. In: IEEE Network Operations and Management Symposium (NOMS), Maui, HI, USA, 16–20 April (2012)
Moo-Mena, F., Garcilazo-Ortiz, J., Basto-Diaz, L., Curi-Quintal, F., Alonzo-Canul, F.: Defining a self-healing QoS-based infrastructure for web services applications. In: Proceedings of the 11th International Conference on Computational Science and Engineering (CSEWORKSHOPS’08), San Paulo, 16–18 July (2008)
Mostefaoui, G.K., Maamar, Z., Narendra, N.C., Thiran, P.: On modeling and developing self-healing web services using aspects. In: Proceedings of the 2007 2nd International Conference on Communication System Software and Middleware and Workshops (COMSWARE 2007), Bangalore, 7–12 January (2007)
Motahari-Nezhad, H.R., Stephenson, B., Singhal, S.: Outsourcing Business to Cloud Computing Services: Opportunities and Challenges, vol. HPL-2009-23. HP Laboratories (2009)
Nielsen, J.: Response times: the 3 important limits. In: Usability Engineering (1993)
Pencole, Y., Subias, A.: A chronicle-based diagnosability approach for discrete timed-event systems: application to web-services. J. Univers. Comput. Sci. 15(17), 3246–3272 (2009). doi:10.3217/jucs-015-17-3246
Sharma, B., Jayachandran, P., Verma, A., Das, C.R.: CloudPD: problem determination and diagnosis in shared dynamic clouds. In: The 43rd IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, 24–27 June (2013)
Silas, S., Ezra, K., Blessing Rajsingh, E.: A novel fault tolerant service selection framework for pervasive computing. Hum.-Cent. Comput. Inf. Sci. 2(1), 1–14 (2012). doi:10.1186/2192-1962-2-5
Song, W., Ma, X., Cheung, S.C., Hu, H., Yang, Q., Lu, J.: Refactoring and publishing WS-BPEL processes to obtain more partners. In: 2011 IEEE International Conference on Web Services (ICWS), Washington, USA, 4–9 July (2011)
Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. Comput. Commun. Rev. 39(1), 50–55 (2008). doi:10.1145/1496091.1496100
Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N.: A review of process fault detection and diagnosis, part III: process history based methods. Comput. Chem. Eng. 27(3), 327–346 (2003). doi:10.1016/s0098-1354(02)00162-x
Wang, H., Wang, G., Chen, A., Wang, C., Fung, C.K., Uczekaj, S.A., Santiago, R.A.: Modeling Bayesian networks for autonomous diagnosis of web services. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2006), California, 11–13, May (2006)
Wei, Y., Blake, M.B.: Service-oriented computing and cloud computing: challenges and opportunities. IEEE Internet Comput. 14(6), 72–75 (2010). doi:10.1109/mic.2010.147
Xu, X.: From cloud computing to cloud manufacturing. Robot. Comput.-Integr. Manuf. 28(1), 75–86 (2012). doi:10.1016/j.rcim.2011.07.002
Yan, Y., Dague, P., Pencole, Y., Cordier, M.-O.: A model-based approach for diagnosing faults in web service processes. Int. J. Web Serv. Res. 6(1), 87–110 (2009). doi:10.4018/jwsr.2009092205
Yang, K., OuYang, G., Ye, L.: Research upon fault diagnosis expert system based on fuzzy neural network. In: WASE International Conference on Information Engineering (ICIE’09), Taiyuan, Shanxi, China, 10–11 July (2009)
Yu, K., Lin, M., Gao, Q., Zhang, H., Zhang, X.: Locating faults using multiple spectra-specific models. In: Proceedings of the 2011 ACM Symposium on Applied Computing, TaiChung, Taiwan, 21–24 March (2011)
Yu, Y., Jones, J.A., Harrold, M.J.: An empirical study of the effects of test-suite reduction on fault localization. In: ACM/IEEE 30th International Conference on Software Engineering (ICSE’08), Leipzig, 10–18 May (2008)
Zhang, Y., Zheng, Z., Lyu, M.R.: BFTCloud: a byzantine fault tolerance framework for voluntary-resource cloud computing. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), Washington, USA, 4–9 July (2011)
Zubarev, J.: The cloud services opportunity. http://www.pipelinepub.com/0211/The-Cloud-Services-Opportunity-Sponsored-by-Parallels1.html (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jia, Z., Chen, R., Xing, X. et al. SFDCloud: top-k service faults diagnosis in cloud computing. Autom Softw Eng 21, 461–488 (2014). https://doi.org/10.1007/s10515-013-0137-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-013-0137-8