ABSTRACT
Data centers run many services that impact millions of users daily. In reality, the latency of each service varies from one request to another. Existing tools allow to monitor services for performance glitches or service disruptions, but typically they do not help understanding the variations in latency.
We propose a general framework for understanding performance of arbitrary black box services. We consider a stream of requests to a given service with their monitored attributes, as well as latencies of serving each request. We propose what we call the multi-dimensional f-measure, that helps for a given interval to identify the subset of monitored attributes that explains it. We design algorithms that use this measure not only for a fixed latency interval, but also to explain the entire range of latencies of the service by segmenting it into smaller intervals.
We perform a detailed experimental study with synthetic data, as well as real data from a large search engine. Our experiments show that our methods automatically identify significant latency intervals together with request attributes that explain them, and are robust.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc 20th Int Conf Very Large Data Bases VLDB, volume 1215, pages 487--499. Citeseer, 1994. Google ScholarDigital Library
- T. Ball and J. Larus. Efficient path profiling. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 46--57, 1996. Google ScholarDigital Library
- P. Barham, R. Isaacs, R. Mortier, and D. Narayanan. Magpie: Online modelling and performance-aware systems. In Proceedings of the 9th conference on Hot Topics in Operating Systems-Volume 9, pages 15--15. USENIX Association, 2003. Google ScholarDigital Library
- A. Destrero, S. Mosci, C. De Mol, A. Verri, and F. Odone. Feature selection for high-dimensional data. Computational management science, 6(1):25--40, 2009.Google Scholar
- P. Devijver and J. Kittler. Pattern recognition: A statistical approach. Prentice/Hall International, 1982.Google Scholar
- R. Fonseca, M. Freedman, and G. Porter. Experiences with tracing causality in networked services. In Proceedings of the 2010 internet network management conference on Research on enterprise networking, pages 10--10. USENIX Association, 2010. Google ScholarDigital Library
- R. Fonseca, G. Porter, R. Katz, S. Shenker, and I. Stoica. X-trace: A pervasive network tracing framework. In Proceedings of the 4th USENIX conference on Networked systems design & implementation, pages 20--20. USENIX Association, 2007. Google ScholarDigital Library
- G. Forman. An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289--1305, 2003. Google ScholarDigital Library
- I. Guyon. Practical feature selection: from correlation to causality. Mining Massive Data Sets for Security: Advances in Data Mining, Search, Social Networks and Text Mining, and their Applications to Security, pages 27--43, 2008.Google Scholar
- R. Kohavi and G. John. Wrappers for feature subset selection. Artificial intelligence, 97(1):273--324, 1997. Google ScholarDigital Library
- A. Land and A. Doig. An automatic method of solving discrete programming problems. Econometrica: Journal of the Econometric Society, pages 497--520, 1960.Google ScholarCross Ref
- M. Mahoney and P. Drineas. Cur matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697--702, 2009.Google ScholarCross Ref
- G. Mann, M. Sandler, D. Kruschevskaja, S. Guha, and E. Even-dar. Modeling the parallel execution of black-box services. USENIX/HotCloud, 2011. Google ScholarDigital Library
- M. Marshak and H. Levy. Evaluating web user perceived latency using server side measurements. Computer Communications, 26:2003, 2003. Google ScholarDigital Library
- A. Miller. Subset selection in regression. Chapman & Hall/CRC, 2002.Google ScholarCross Ref
- D. Musicant, V. Kumar, A. Ozgur, et al. Optimizing f-measure with support vector machines. In Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference, pages 356--360, 2003.Google Scholar
- P. Narendra and K. Fukunaga. A branch and bound algorithm for feature subset selection. Computers, IEEE Transactions on, 100(9):917--922, 1977. Google ScholarDigital Library
- K. Ostrowski, G. Mann, and M. Sandler. Diagnosing latency in multi-tier black-box services. 2011.Google Scholar
- R. R. Sambasivan, A. X. Zheng, M. De Rosa, E. Krevat, S. Whitman, M. Stroucken, W. Wang, L. Xu, and G. R. Ganger. Diagnosing performance changes by comparing request flows. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, NSDI'11, pages 4--4, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
- B. Sigelman, L. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Google Research, 2010.Google Scholar
- C. Van Rijsbergen. Information retrieval, 1979. Google ScholarDigital Library
- N. Ye, K. Chai, W. Lee, and H. Chieu. Optimizing f-measures: A tale of two approaches. 2012.Google Scholar
Index Terms
- Understanding latency variations of black box services
Recommendations
The impact of bursty traffic on FPCF packet switch performance
This paper analyses and compares the performance of forward planning conflict-free (FPCF), virtual output queuing-partitioned (VOQ-P) and virtual output queuing-shared (VOQ-S) packet switches. The influence of packet burst size, offered switch load and ...
Change-point detection for black-box services
FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineeringModern software systems are increasingly built out of services that are developed, deployed, and operated by independent organizations, which expose them for use by potential clients. Services may be directly invoked by clients. They may also be ...
Managing the Replaceability of Web Services Using Underlying Semantics
In the context of web services, service replaceability refers to the ability of substituting one service for another. With the bloom of service-oriented computing, the effective management of service replaceability is important to make the applications ...
Comments