Abstract
Availability of good quality monitoring data is a vital need for management of today’s data centers. However, effective use of monitoring tools demands an understanding of the monitoring requirements that system administrators most often lack. Instead of a well-defined process of defining a monitoring strategy, system administrators adopt a manual and intuition-based approach. In this paper, we propose to replace the ad-hoc, manual, intuition-based approach with a more systematic, automated, and analytics-based approach for system monitoring. We propose an adaptive monitoring framework where end-to-end probing-based solutions are used to adapt the at-a-point monitoring tools. We present a systematic framework to use probes for adjusting monitoring levels. We present algorithms to select and analyze probes and to dynamically adapt the monitoring policies based on probe analysis. We demonstrate the effectiveness of the proposed solution using real-world examples as well as simulations.











Similar content being viewed by others
References
Natu, M., Sethi, A.S.: Application of adaptive probing for fault diagnosis in computer networks. In: Proceedings of NOMS’08, Brazil (2008)
Natu, M., Sethi, A.S.: Probabilistic fault diagnosis using adaptive probing. In: Proceedings of DSOM 2007, San Jose, CA (2007)
Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, ISBN 978-0-521-47465-8 (1995)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: Proceedings of SIGMOD’96, Montreal, Canada, pp. 103–114 (1996)
Yoon, H., Yang, K., Shahabi, C.: Feature subset selection and feature ranking for multivariate time series. IEEE Trans. Knowl. Data Eng. 17, 1186–1198 (2005)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Jolliffe, I.: Principal Component Analysis. Springer, Berlin (1986)
Stewart, G.W.: On the early history of the singular value decomposition. SIAM Rev. 35(4), 551–566 (1993)
Ithaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Gr. Stat. 5(3), 299–314 (1996)
Medina, A., Lakhina, A., Matta, I., Byers, J.: Brite: an approach to universal topology generation. In: MASCOTS’01. Cincinnati, Ohio (2001)
Schwetman, H.: Csim: a C-based process-oriented simulation language. In: Proceedings of WSC ’86 (1986)
Scherr, A.: An analysis of time shared computer systems. MIT Press, Cambridge (1967)
Brodie, M., Rish, I., Ma, S., Grabarnik, G., Odintsova, N.: Active probing. Technical report, IBM (2002)
Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. In: Proceedings of SIGCOMM’04, Portland, Oregon, USA (2004)
Groenendijk, J., Huang, Y., Fallon, L.: Adaptive terminal reporting for scalable service quality monitoring in large networks. In: Proceedings of CNSM 2011, Paris, France (2011)
Bhatia, S., Kumar, A., Fiuczynski, M.E., Peterson, L.: Lightweight, high-resolution monitoring for troubleshooting production systems. In: Proceedings of the 8th USENIX conference on Operating systems design and implementation, San Diego, California, ser. OSDI’08 (2008)
Gaspary, L.P., Canterle, E.: Assessing transaction-based internet applications performance through a passive network traffic monitoring approach. In: Proceedings of GLOBECOM’04, Dallas, USA (2004)
Han, S.-H., Kim, M.-S., Ju, H.-T., Hong, W.-K.J.: The architecture of ng-mon: a passive network monitoring system for high-speed ip networks. In: Proceedings of DSOM’02, Montreal, Canada (2002)
Yu, L., Cheng, L., Qiao, Y., Yuan, Y., Chen, X.: An efficient active probing approach based on the combination of online and offline strategies. In: Proceedings of CNSM’10, pp. 298–301 (2010)
Quan, L., Heidemann, Z., Pradkin, Y.: Detecting internet outages with precise active probing. Technical Report ISI-TR-2012-678 (2012)
Jaggard, A., Kopparty, S., Ramachandran, V., Wright, R.N.: The design space of probing algorithms for network-performance measurement. SIGMETRICS Perform. Eval. Rev. 41(1), 105–116 (2013)
Quan, L., Heidemann, J., Pradkin, Y.: Trinocular: understanding internet reliability through adaptive probing. SIGCOMM Comput. Commun. Rev. 43(4), 255–266 (2013)
Zheng, Q., Cao, G.: Minimizing probing cost and achieving identifiability in probe-based network link monitoring. IEEE Trans. Comput. 62(3), 510–523 (2013)
Liu, S., Zafer, M., Wong, H., Lee, K.: Gateway selection in hybrid wireless networks through cooperative probing. In: Proceedings of IFIP/IEEE international symposium on integrated network management (IM 2013), Ottawa, Canada (2013)
Tang, Y., Al-Shaer, E.S., Boutaba, R.: Active integrated fault localization in communication networks. In: Proceedings of IM 2005, pp. 543–556 (2005)
Al-Shaer, E., Tang, Y.: Qos path monitoring for multicast networks. J. Netw. Syst. Manag. 10(3), 357–381 (2002)
Yu, M., Greenberg, A., Maltz, D., Rexford, J., Yuan, L., Kandula, S., Kim, C.: Profiling network performance for multi-tier data center applications. In: Proceedings of NSDI’11, Boston, USA (2011)
Huang, L., Nguyen, X., Garofalakis, M., Hellerstein, J.M.: Communication-efficient online detection of network-wide anomalies. In: Proceedings of INFOCOM’07, Anchorage, Alaska (2007)
Gao, K., Kar, G., Kermani, P.: Approaches to building self healing systems using dependency analysis. In: Proceedings of IEEE/IFIP network operations and management symposium (NOMS), pp. 119–132 (2004)
Wolski, R.: Experiences with predicting resource performance on-line in computational grid settings. SIGMETRICS Perform. Eval. Rev. 30(4), 41–49 (2003)
Jeswani, D., Natu, M.., Ghosh, R.K.: Adaptive monitoring: a framework to adapt passive monitoring using probing. In: Proceedings of CNSM 2012, Las Vegas, USA (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jeswani, D., Natu, M. & Ghosh, R.K. Adaptive Monitoring: Application of Probing to Adapt Passive Monitoring. J Netw Syst Manage 23, 950–977 (2015). https://doi.org/10.1007/s10922-014-9330-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10922-014-9330-8