Abstract
Change detection in continuous data streams is very useful in today’s computing environment. However, high computation overhead prevents many data mining algorithms from being used for online monitoring. We propose a history-guided low-cost change detection method based on the “s-monitor” approach. The “s-monitor” approach monitors the stream with simple models (“s-monitors”) which can reflect changes of complicated models. By interleaving frequent s-monitor checks and infrequent complicated model checks, we can keep a close eye on the stream without heavy computation overhead.
The selection of s-monitors is critical for successful change detection. History can often provide insights to select appropriate s-monitors and monitor the streams. We demonstrate this method using subspace cluster monitoring for log data and frequent item set monitoring for retail data. Our experiments show that this approach can catch more changes in a more timely manner with lower cost than traditional approaches.
The same approach can be applied to different models in various applications, such as monitoring live weather data, stock market fluctuations and network traffic streams.
This research has been supported by National Science Foundation Grant CCR-0121643.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proc. of VLDB (2004)
Aggarwal, C., Han, J., Wang, J., Yu, P.S.: On demand classification of data streams. In: Proc. of the ACM SIGKDD (2004)
Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proc. of ACM SIGMOD, pp. 575–586 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. of VLDB, pp. 81–92 (2003)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD, pp. 94–105 (1998)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of VLDB, pp. 487–499 (1994)
Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proc. of ACM SIGMOD (2003)
Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proc. of VLDB, pp. 203–214 (2002)
Carney, D., et al.: Monitoring streams - a new class of data management applications. In: Proc. of VLDB (2002)
Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining data streams under block evolution. SIGKDD Explorations 3(2), 1–10 (2002)
Giles, C.L., Lawrence, S., Tsoi, A.C.: Noisy time series prediction using a recurrent neural network and grammatical inference. Machine Learning 44(1/2), 161–183 (2001)
Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proc. of STOC, pp. 471–475 (2001)
Huang, W., Omiecinski, E., Mark, L., Zhao, W.: S-monitors: Low-cost change detection in data streams. In: Proc. of AusDM (2005)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of ACM SIGKDD (2001)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proc. of VLDB, pp. 180–191 (2004)
Nagesh, H., Goil, S., Choudhary, A.: Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report 9906-010, Northwestern University (1999)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explorations 6(1), 90–105 (2004)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. of ACM SIGKDD (2003)
Yang, J., Yan, X., Han, J., Wang, W.: Discovering evolutionary classifier over high speed non-static stream. In: Advanced Methods for Knowledge Discovery from Complex Data. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, W., Omiecinski, E., Mark, L., Nguyen, M.Q. (2009). History Guided Low-Cost Change Detection in Streams. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-03730-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)