Abstract
We are concerned with the issue of real-time change-point detection in time series. This technology has recently received vast attentions in the area of data mining since it can be applied to a wide variety of important risk management issues such as the detection of failures of computer devices from computer performance data, the detection of masqueraders/malicious executables from computer access logs, etc. In this paper we propose a new method of real-time change point detection employing the sequentially discounting normalized maximum likelihood codingĀ (SDNML). Here the SDNML is a method for sequential data compression of a sequence, which we newly develop in this paper. It attains the least code length for the sequence and the effect of past data is gradually discounted as time goes on, hence the data compression can be done adaptively to non-stationary data sources. In our method, the SDNML is used to learn the mechanism of a time series, then a change-point score at each time is measured in terms of the SDNML code-length. We empirically demonstrate the significant superiority of our method over existing methods, such as the predictive-coding method and the hypothesis testing method, in terms of detection accuracy and computational efficiency for artificial data sets. We further apply our method into real security issues called malware detection. We empirically demonstrate that our method is able to detect unseen security incidents at significantly early stages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proc. of ACM-SIGKDD Intāl Conf. Knowledge Discovery and Data Mining, pp. 53ā62 (1999)
Guralnik, V., Srivastava, J.: Event detection from time series data. In: Proc. ACM-SIGKDD Intāl Conf. Knowledge Discovery and Data Mining, pp. 33ā42 (1999)
Hawkins, D.M.: Point estimation of parameters of piecewise regression models. J. Royal Statistical Soc. Series CĀ 25(1), 51ā57 (1976)
Rissanen, J.: Information and Complexity in Statistical Modeling. Springer, Heidelberg (2007)
Rissanen, J., Roos, T., MyllymƤki, P.: Model selection by sequentially normalized least squares. Jr. Multivariate AnalysisĀ 101(4), 839ā849 (2010)
Roos, T., Rissanen, J.: On sequentially normalized maximum likelihood models. In: Proc. of 1st Workshop on Information Theoretic Methods in Science and Engineering, WITSME 2008 (2009)
Shtarkov, Y.M.: Universal sequential coding of single messages. Problems of Information TransmissionĀ 23(3), 175ā186 (1987)
Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: Proc. Fifteenth ACM-SIGKDD Intāl Conf. Knowledge Discovery and Data Mining, pp. 667ā675 (2009)
Takeuchi, J., Yamanishi, K.: A unifying framework for detecting outliers and change-points from time series. IEEE Transactions on Knowledge and Data EngineeringĀ 18(44), 482ā492 (2006)
Wang, J., Deng, P., Fan, Y., Jaw, L., Liu, Y.: Virus detection using data mining techniques. In: Proc. of ICDM 2003 (2003)
Yamanishi, K., Takeuchi, J.: A unifying approach to detecting outliers and change-points from nonstationary data. In: Proc. of the Eighth ACM SIGKDD Intāl Conf. Knowledge Discovery and Data Mining (2002)
Ye, Y., Li, T., Jiang, Q., Han, Z., Wan, L.: Intelligent file scoring system for malware detection from the gray list. In: Proc. of the Fifteenth ACM SIGKDD Intāl Conf. Knowledge Discovery and Data Mining (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Urabe, Y., Yamanishi, K., Tomioka, R., Iwai, H. (2011). Real-Time Change-Point Detection Using Sequentially Discounting Normalized Maximum Likelihood Coding. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-20847-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)