Skip to main content
Log in

SFDCloud: top-k service faults diagnosis in cloud computing

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

With a variety of providers large and small delivering a number of cloud-based services, cloud computing is evolving into an important service delivery infrastructure. One of the challenges in this evolution is how to provide necessary fault handling for migration long-running or computationally-intensive application services into shared open cloud infrastructures. To minimize failure impact on services and application executions, we present a diagnostic architecture and a diagnosis method based on the service dependence graph (SDG) model and the service execution log for handling service faults. By decoupling diagnosis service components and sharing diagnosis resources, the scalability of diagnosis methods is improved by incorporating third-party diagnostic components into our architecture. By analyzing the dependence relations of activities in SDG model, our diagnosis method identifies the incorrect activities, and explains the root causes for the web service composition faults, based on the differences between successful and failed executions of composite service. Experimental results show that our method is effective in diagnosing faults in web service composition of various scales.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://jade.cselt.it/.

References

  • Abreu, R., van Gemund, A.J.C.: A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In: Symposium on Abstraction, Reformulation and Approximation, Lake Arrowhead, CA, USA, July (2009)

    Google Scholar 

  • Abreu, R., Zoeteweij, P., Golsteijn, R., van Gemund, A.J.C.: A practical evaluation of spectrum-based fault localization. J. Syst. Softw. 82(11), 1780–1792 (2009). doi:10.1016/j.jss.2009.06.035

    Article  Google Scholar 

  • Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: On the accuracy of spectrum-based fault localization. In: Testing: Academic and Industrial Conference Practice and Research Techniques (TAICPART-MUTATION 2007), Windsor, United Kingdom, 10–14 September (2007)

    Google Scholar 

  • Ait-Bachir, A., Fauvet, M.-C.: Diagnosing and measuring incompatibilities between pairs of services. In: Proceedings of the 20th International Conference on Database and Expert Systems Applications (DEXA’09), Linz, Austria, 31 August–4 September (2009)

    Google Scholar 

  • Al-Masri, E., Mahmoud, Q.H.: Investigating web services on the world wide web. In: Proceedings of the 17th International Conference on World Wide Web, Beijing, China, 21–25 April (2008)

    Google Scholar 

  • Alodib, M., Bordbar, B.: A model-based approach to fault diagnosis in service oriented architectures. In: 7th IEEE European Conference on Web Services (ECOWS’09), Eindhoven, The Netherlands, 9–11 November (2009)

    Google Scholar 

  • Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S.: Business Process Execution Language for Web Services (2003)

    Google Scholar 

  • Ardissono, L., Console, L., Goy, A., Petrone, G., Picardi, C., Segnan, M., Dupre, D.T.: Enhancing web services with diagnostic capabilities. In: Proceedings of the Third European Conference on Web Services (ECOWS’05), Växjö, Sweden, 14–16 November (2005)

    Google Scholar 

  • Ardissono, L., Furnari, R., Goy, A., Petrone, G., Segnan, M.: Fault tolerant web service orchestration by means of diagnosis. In: Proceedings of the Third European Conference on Software Architecture (EWSA’06), Nantes, France, 4–5 September (2006)

    Google Scholar 

  • Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). doi:10.1145/1721654.1721672

    Article  Google Scholar 

  • Baah, G.K., Podgurski, A., Harrold, M.J.: The probabilistic program dependence graph and its application to fault diagnosis. IEEE Trans. Softw. Eng. 36(4), 528–545 (2010). doi:10.1109/tse.2009.87

    Article  Google Scholar 

  • Chiang, M.-L.: Efficient diagnosis protocol to enhance the reliability of a cloud computing environment. J. Netw. Syst. Manag. 20(4), 579–600 (2012). doi:10.1007/s10922-012-9247-z

    Article  Google Scholar 

  • Dai, Y., Yang, L., Zhang, B., Zhu, Z.: Exception diagnosis for composite service based on error propagation degree. In: 2011 IEEE International Conference on Services Computing (SCC 2011), Washington, DC, USA, 4–9 July (2011)

    Google Scholar 

  • Du, J., Wei, W., Gu, X., Yu, T.: RunTest: assuring integrity of dataflow processing in cloud computing infrastructures. In: Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (ASIACCS), Beijing, China, 13–16 April (2010)

    Google Scholar 

  • Duan, S., Zhang, H., Jiang, G., Meng, X.: Supporting system-wide similarity queries for networked system management. In: Proceedings of the 2010 IEEE-IFIP Network Operations and Management Symposium (NOMS), Osaka, 19–23 April (2010)

    Google Scholar 

  • Friedrich, G., Fugini, M., Mussi, E., Pernici, B., Tagni, G.: Exception handling for repair in service-based processes. IEEE Trans. Softw. Eng. 36(2), 198–215 (2010). doi:10.1109/tse.2010.8

    Article  Google Scholar 

  • Höfer, C.N., Karagiannis, G.: Cloud computing services: taxonomy and comparison. J. Internet Serv. Appl. 2(2), 81–94 (2011). doi:10.1007/s13174-011-0027-x

    Article  Google Scholar 

  • Hamadi, R., Benatallah, B., Medjahed, B.: Self-adapting recovery nets for policy-driven exception handling in business processes. Distrib. Parallel Databases 23(1), 1–44 (2008). doi:10.1007/s10619-007-7020-1

    Article  Google Scholar 

  • Han, X., Shi, Z., Niu, W., Chen, K., Yang, X.: Similarity-based Bayesian learning from semi-structured log files for fault diagnosis of web services. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Toronto, Canada, 31 August–3 September (2010)

    Google Scholar 

  • Hui, S.C., Fong, A.C.M., Jha, G.: A web-based intelligent fault diagnosis system for customer service support. Eng. Appl. Artif. Intell. 14(4), 537–548 (2001). doi:10.1016/S0952-1976(01)00018-5

    Article  Google Scholar 

  • Jhawar, R., Piuri, V.: Fault tolerance management in IaaS clouds. In: IEEE 1st AESS European Conference on Satellite Telecommunications (ESTEL 2012), Rome, Italy, 2–5 October (2012)

    Google Scholar 

  • Jones, J.A., Harrold, M.J.: Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, CA, USA, 7–11 November (2005)

    Google Scholar 

  • Juhnke, E., Dornemann, T., Freisleben, B.: Fault-tolerant BPEL workflow execution via cloud-aware recovery policies. In: 35th Euromicro Conference on Software Engineering and Advanced Applications (SEAA’09), Patras, 27–29 August (2009)

    Google Scholar 

  • Kang, H., Chen, H., Jiang, G.P.: A fault detection and diagnosis tool for virtualized consolidation systems. In: Proceedings of the 7th International Conference on Autonomic Computing, Washington, DC, USA, 7–11 June (2010)

    Google Scholar 

  • Kemper, P., Tepper, C.: Automated trace analysis of discrete-event system models. IEEE Trans. Softw. Eng. 35(2), 195–208 (2009). doi:10.1109/TSE.2008.75

    Article  Google Scholar 

  • Kopp, O., Leymann, F., Wutke, D.: Fault handling in the web service stack. Serv. Oriented Comput. Appl. 6470, 303–317 (2010). doi:10.1007/978-3-642-17358-5_21

    Article  Google Scholar 

  • Lakshmi, H.N., Mohanty, H.: Automata for web services fault monitoring and diagnosis. Int. J. Comput. Commun. Technol. 3(2), 13–18 (2011)

    Google Scholar 

  • Li, Y., Ye, L., Dague, P., Melliti, T.: A decentralized model-based diagnosis for BPEL services. In: Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI’09), Newark, NJ, 2–4 November 2009

    Google Scholar 

  • Liu, A., Li, Q., Huang, L.S., Xiao, M.J.: FACTS: a framework for fault-tolerant composition of transactional web services. IEEE Trans. Serv. Comput. 3(1), 46–59 (2010). doi:10.1109/tsc.2009.28

    Article  Google Scholar 

  • Mayer, W., Friedrich, G., Stumptner, M.: Diagnosis of service failures by trace analysis with partial knowledge. Serv. Oriented Comput. Appl. 6470, 334–349 (2010). doi:10.1007/978-3-642-17358-5_23

    Article  Google Scholar 

  • Mell, P., Grance, T.: The NIST Definition of Cloud Computing. National Institute of Standards and Technology, New York (2011)

    Google Scholar 

  • Mi, H., Wang, H., Yin, G., Cai, H., Zhou, Q., Sun, T.: In: Performance problems diagnosis in cloud computing systems by mining request Trace logs. In: IEEE Network Operations and Management Symposium (NOMS), Maui, HI, USA, 16–20 April (2012)

    Google Scholar 

  • Moo-Mena, F., Garcilazo-Ortiz, J., Basto-Diaz, L., Curi-Quintal, F., Alonzo-Canul, F.: Defining a self-healing QoS-based infrastructure for web services applications. In: Proceedings of the 11th International Conference on Computational Science and Engineering (CSEWORKSHOPS’08), San Paulo, 16–18 July (2008)

    Google Scholar 

  • Mostefaoui, G.K., Maamar, Z., Narendra, N.C., Thiran, P.: On modeling and developing self-healing web services using aspects. In: Proceedings of the 2007 2nd International Conference on Communication System Software and Middleware and Workshops (COMSWARE 2007), Bangalore, 7–12 January (2007)

    Google Scholar 

  • Motahari-Nezhad, H.R., Stephenson, B., Singhal, S.: Outsourcing Business to Cloud Computing Services: Opportunities and Challenges, vol. HPL-2009-23. HP Laboratories (2009)

  • Nielsen, J.: Response times: the 3 important limits. In: Usability Engineering (1993)

    Google Scholar 

  • Pencole, Y., Subias, A.: A chronicle-based diagnosability approach for discrete timed-event systems: application to web-services. J. Univers. Comput. Sci. 15(17), 3246–3272 (2009). doi:10.3217/jucs-015-17-3246

    MATH  Google Scholar 

  • Sharma, B., Jayachandran, P., Verma, A., Das, C.R.: CloudPD: problem determination and diagnosis in shared dynamic clouds. In: The 43rd IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Budapest, Hungary, 24–27 June (2013)

    Google Scholar 

  • Silas, S., Ezra, K., Blessing Rajsingh, E.: A novel fault tolerant service selection framework for pervasive computing. Hum.-Cent. Comput. Inf. Sci. 2(1), 1–14 (2012). doi:10.1186/2192-1962-2-5

    Article  Google Scholar 

  • Song, W., Ma, X., Cheung, S.C., Hu, H., Yang, Q., Lu, J.: Refactoring and publishing WS-BPEL processes to obtain more partners. In: 2011 IEEE International Conference on Web Services (ICWS), Washington, USA, 4–9 July (2011)

    Google Scholar 

  • Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. Comput. Commun. Rev. 39(1), 50–55 (2008). doi:10.1145/1496091.1496100

    Article  Google Scholar 

  • Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.N.: A review of process fault detection and diagnosis, part III: process history based methods. Comput. Chem. Eng. 27(3), 327–346 (2003). doi:10.1016/s0098-1354(02)00162-x

    Article  Google Scholar 

  • Wang, H., Wang, G., Chen, A., Wang, C., Fung, C.K., Uczekaj, S.A., Santiago, R.A.: Modeling Bayesian networks for autonomous diagnosis of web services. In: Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2006), California, 11–13, May (2006)

    Google Scholar 

  • Wei, Y., Blake, M.B.: Service-oriented computing and cloud computing: challenges and opportunities. IEEE Internet Comput. 14(6), 72–75 (2010). doi:10.1109/mic.2010.147

    Article  Google Scholar 

  • Xu, X.: From cloud computing to cloud manufacturing. Robot. Comput.-Integr. Manuf. 28(1), 75–86 (2012). doi:10.1016/j.rcim.2011.07.002

    Article  Google Scholar 

  • Yan, Y., Dague, P., Pencole, Y., Cordier, M.-O.: A model-based approach for diagnosing faults in web service processes. Int. J. Web Serv. Res. 6(1), 87–110 (2009). doi:10.4018/jwsr.2009092205

    Article  Google Scholar 

  • Yang, K., OuYang, G., Ye, L.: Research upon fault diagnosis expert system based on fuzzy neural network. In: WASE International Conference on Information Engineering (ICIE’09), Taiyuan, Shanxi, China, 10–11 July (2009)

    Google Scholar 

  • Yu, K., Lin, M., Gao, Q., Zhang, H., Zhang, X.: Locating faults using multiple spectra-specific models. In: Proceedings of the 2011 ACM Symposium on Applied Computing, TaiChung, Taiwan, 21–24 March (2011)

    Google Scholar 

  • Yu, Y., Jones, J.A., Harrold, M.J.: An empirical study of the effects of test-suite reduction on fault localization. In: ACM/IEEE 30th International Conference on Software Engineering (ICSE’08), Leipzig, 10–18 May (2008)

    Google Scholar 

  • Zhang, Y., Zheng, Z., Lyu, M.R.: BFTCloud: a byzantine fault tolerance framework for voluntary-resource cloud computing. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), Washington, USA, 4–9 July (2011)

    Google Scholar 

  • Zubarev, J.: The cloud services opportunity. http://www.pipelinepub.com/0211/The-Cloud-Services-Opportunity-Sponsored-by-Parallels1.html (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, Z., Chen, R., Xing, X. et al. SFDCloud: top-k service faults diagnosis in cloud computing. Autom Softw Eng 21, 461–488 (2014). https://doi.org/10.1007/s10515-013-0137-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-013-0137-8

Keywords

Navigation