Abstract
A burst, i.e., an unusally high frequency of occurrence of an event in a time-window, is interesting in many monitoring applications that give rise to temporal data as it often indicates an abnormal activity. While the problem of detecting bursts from time-series data has been well addressed, the question of what choice of thresholds, on the number of events as well as on the window size, makes a window “unusally bursty” remains a relevant one. We consider the problem of finding critical values of both these thresholds. Since for most applications, we hardly have any apriori idea of what combination of thresholds is critical, the range of possible values for either threshold can be very large. We formulate finding the combination of critical thresholds as a two-dimensional search problem and design efficient deteministic and randomized divide-and-conquer heuristics. For the deterministic heuristic, we show that under some weak assumptions, the computational overhead is logarithmic in the sizes of the ranges. Under identical assumptions, the expected computational overhead of the randomized heuristic in the worst case is also logarithmic. Using data obtained from logs of medical equipment, we conduct extensive simulations that reinforce our theoretical results, and show that on average, the randomized heuristic beats its deteministic counterpart in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1047–1050 (2009)
Barford, P., Crovella, M.: Generating representative web workloads for network and server performance evaluation. In: SIGMETRICS, pp. 151–160 (1998)
Beran, J.: Statistics for Long-Memory Processes. Chapman & Hall, New York (1994)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, McGraw-Hill Book Company (2009)
Cuzzocrea, A.: CAMS: OLAPing Multidimensional Data Streams Efficiently. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 48–62. Springer, Heidelberg (2009)
Cuzzocrea, A.: Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 575–576. Springer, Heidelberg (2011)
Cuzzocrea, A., Chakravarthy, S.: Event-based lossy compression for effective and efficient OLAP over data streams. Data and Knowledge Engineering 69(7), 678–708 (2010)
Garrett, M.W., Willinger, W.: Analysis, modeling and generation of self-similar vbr video traffic. In: SIGCOMM, pp. 269–280 (1994)
Kleinberg, J.M.: Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery 7(4), 373–397 (2003)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. In: WWW, pp. 568–576 (2003)
Lahiri, B., Akrotirianakis, I., Moerchen, F.: Finding critical thresholds for defining bursts in event logs, http://home.eng.iastate.edu/~bibudh/techreport/burst_detection.pdf
Leskovec, J., Backstrom, L., Kleinberg, J.M.: Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 497–506 (2009)
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1155–1158 (2010)
Lithium, http://www.lithium.com/
Google AdWords, http://www.google.com/ads/adwords2/
Radian, http://www.radian6.com/
Sysomos, http://www.sysomos.com/
Thoora, http://thoora.com/
Trendrr, http://trendrr.com/
Twitscoop, http://www.twitscoop.com/
Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D.: Identifying similarities, periodicities and bursts for online search queries. In: SIGMOD Conference, pp. 131–142 (2004)
Wang, M., Chan, N.H., Papadimitriou, S., Faloutsos, C., Madhyastha, T.M.: Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In: ICDE, pp. 507–516 (2002)
Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: KDD, pp. 784–793 (2007)
Xu, K., Zhang, Z.L., Bhattacharyya, S.: Reducing unwanted traffic in a backbone network. Appeared in the Proceedings of the Steps to Reducing Unwanted Traffic on the Internet Workshop, SRUTI (2005)
Yuan, Z., Jia, Y., Yang, S.: Online burst detection over high speed short text streams. In: International Conference on Computational Science (ICCS), pp. 717–725 (2007)
Yuan, Z., Miao, J., Jia, Y., Wang, L.: Counting data stream based on improved counting bloom filter. In: Proceedings of the Ninth International Conference on Web-Age Information Management (WAIM), pp. 512–519 (2008)
Zhang, L., Guan, Y.: Detecting click fraud in pay-per-click streams of online advertising networks. In: ICDCS (2008)
Zhang, X., Shasha, D.: Better burst detection. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE), p. 146 (2006)
Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 336–345 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lahiri, B., Akrotirianakis, I., Moerchen, F. (2013). Finding Critical Thresholds for Defining Bursts in Event Logs. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems VIII. Lecture Notes in Computer Science, vol 7790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37574-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-37574-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37573-6
Online ISBN: 978-3-642-37574-3
eBook Packages: Computer ScienceComputer Science (R0)