Skip to main content
Log in

Reactive performance monitoring of Cloud computing environments

Cluster Computing Aims and scope Submit manuscript

Abstract

This paper presents a cross-layer reactive monitoring approach for Cloud computing environments. Based on complex event processing (CEP) methodology, our proposal monitors and analyzes performance metrics across Cloud layers to detect and repair performance-related problems. The approach utilizes novel CEP analysis rules and a new action manager framework. The proposed analysis rules are derived from a comprehensive analysis of the interactions between Cloud layers. The results of this study are used to reduce the number of monitored parameters, define the analysis rules and identify the causes of performance-related problems. Our novel action manager framework assigns a set of repair actions to each performance-related problem and checks the success of the applied action. The results of several experiments indicate that the time needed to fix a performance-related problem is reasonably short. They also show that the CPU overhead of using our approach is negligible. Moreover, experimental results demonstrate the merits of our approach in terms of speeding up the repair and reducing the number of triggered alarms compared to baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. http://www.ganglia.sourceforge.net.

  2. http://www.linux.die.net/man/1/iostat.

  3. http://www.linux.die.net/man/1/mpstat.

  4. http://esper.codehaus.org/.

  5. A data point represents one measurement of the studied metric.

  6. http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/.

  7. http://www.redcad.org/members/mdhaffar/cep4cma/examples.html.

  8. http://www.redcad.org/members/mdhaffar/cep4cma.

  9. The recovery actions are identified by a domain expert and are not sorted in advance.

  10. http://www.openstack.org/.

  11. http://geronimo.apache.org/GMOxDOC10/day-trader.html.

  12. An I/O performance-related problem is related to a high number of I/O requests to the physical disk.

  13. http://manpages.ubuntu.com/manpages/trusty/en/man1/sysbench.1.html.

References

  1. Al-Ayyoub, M., Jararweh, Y., Daraghmeh, M., Althebyan, Q.: Multi-agent based dynamic resource provisioning and monitoring for cloud computing systems infrastructure. Clust. Comput. 18(2), 919–932 (2015)

    Article  Google Scholar 

  2. Alhosban, A., Hashmi, K., Malik, Z., Medjahed, B.: Self-healing framework for Cloud-based services. In: ACS International Conference on Computer Systems and Applications, AICCSA 2013, pp. 1–7. Ifrane, 27–30 May 2013

  3. Bhaduri, K., Das, K., Matthews, B.L.: Detecting abnormal machine characteristics in Cloud infrastructures. In: Proceedings of the International Conference on Data Mining Workshops, pp. 137–144. IEEE Computer Society (2011)

  4. Bhaumik, S.: Root cause analysis in engineering failures. Trans. Indian Inst. Met. 63, 297–299 (2010)

    Article  Google Scholar 

  5. Crocker, D.C.: Some interpretations of the multiple correlation coefficient. Am. Stat. 26, 31–33 (1972)

    Google Scholar 

  6. Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 1–62 (2012)

    Article  Google Scholar 

  7. Dai, Y., Xiang, Y., Zhang, G.: Self-healing and hybrid diagnosis in Cloud computing. In: Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom), vol. 5931, pp. 45–56. Springer, Berlin (2009)

  8. de Chaves, S.A., Uriarte, R.B., Westphall, C.B.: Toward an architecture for monitoring private clouds. IEEE Commun. Mag. 49, 130–137 (2011)

    Article  Google Scholar 

  9. Faul, F., Erdfelder, E., Buchner, A., Lang, A.G.: Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav. Res. Method 41, 1149–1160 (2009)

    Article  Google Scholar 

  10. Gupta, D., Gardner, R., Cherkasova, L.: XenMon: QoS Monitoring and Performance Profiling Tool. Technical Report, HP Labs (2005)

  11. Magalhaes, J.P., Silva, L.M.: A Framework for self-healing and self-adaptation of cloud-hosted web-based applications. In: Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 555–564. IEEE Computer Society (2013)

  12. Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)

    Article  Google Scholar 

  13. Mdhaffar, A., Ben-Halima, R., Juhnke, E., Jmaiel, M., Freisleben, B.: AOP4CSM: An aspect-oriented programming approach for Cloud service monitoring. In: Proceedings of the 11th IEEE International Conference on Computer and Information Technology, pp. 363–370. IEEE (2011)

  14. Mdhaffar, A., Halima, R.B., Jmaiel, M., Freisleben, B.: CEP4Cloud: complex event processing for self-healing clouds. In: The Proceedings of the 23rd IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Entreprises (WETICE 2014), pp. 62–67. IEEE Computer Society Press, Parma (2014)

  15. Mdhaffar, A., Halima, R.B., Jmaiel, M., Freisleben, B.: CEP4CMA: multi-layer cloud performance monitoring and analysis via complex event processing. In: Proceedings of the 2nd International Conference on NETworked sYStems (NETYS), pp. 138–152. Springer, Marrakech (2014)

  16. Rabkin, A.: Chukwa: a large-scale monitoring system. In: Cloud Computing and Its Applications, pp. 1–5 (2008)

  17. Taylor, R.: Interpretation of the correlation coefficient: a basic review. J. Diagn. Med. Sonogr. 6, 35–39 (1990)

    Article  Google Scholar 

  18. Zhu, Q., Tung, T., Xie, Q.: Automatic fault diagnosis in cloud infrastructure. In: Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science, pp. 467–474. IEEE Computer Society (2013)

Download references

Acknowledgments

This work is partly supported by the German Ministry of Education and Research (BMBF) and the German Academic Exchange Service (DAAD).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afef Mdhaffar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mdhaffar, A., Halima, R.B., Jmaiel, M. et al. Reactive performance monitoring of Cloud computing environments. Cluster Comput 20, 2465–2477 (2017). https://doi.org/10.1007/s10586-016-0676-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0676-4

Keywords

Navigation