Abstract
In many applications, stream data are too voluminous to be collected in a central fashion and often transmitted on a distributed network. In this paper, we focus on the outlier detection over distributed data streams in real time, firstly, we formalize the problem of outlier detection using the kernel density estimation technique. Then, we adopt the fading strategy to keep pace with the transient and evolving natures of stream data, and mico-cluster technique to conquer the data partition and “one-pass” scan. Furthermore, our extensive experiments with synthetic and real data show that the proposed algorithm is efficient and effective compared with existing outlier detection algorithms, and more suitable for data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley-Interscience, Chichester (2001)
Barnett, V., Lewis, T.: Outliers in statistical data, 3rd edn. Wiley, Chichester (2001)
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: ICML (2000)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB (1994)
Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD (1995)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: SIGMOD (1996)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: VLDB (1998)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD (1998)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB (1998)
Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: VLDB (1999)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD (2000)
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD (2003)
Ghoting, A., Parthasarathy, S., Otey, M.E.: Fast mining of distance-based outliers in high-dimensional datasets. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, Springer, Heidelberg (2006)
Wu, M., Jermaine, C.: Outlier detection by sampling with accuracy guarantees. In: KDD (2006)
Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: KDD (2006)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: SIGMOD (2000)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: ICDE (2003)
Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD (2005)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, Springer, Heidelberg (2006)
Babcock, B., Olston, C.: Distributed top-k monitoring. In: SIGMOD (2003)
Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: ICDE (2005)
Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB (2006)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: ICDE (2007)
Hawkins, D.: Identification of Outliers, 1st edn. Springer, Heidelberg (1980)
Gijbels, I., Pope, A., Wand, M.: Automatic forecasting via exponential smoothing: Asymptotic properties (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, L., Han, W., Yang, S., Zou, P., Jia, Y. (2007). Continuous Adaptive Outlier Detection on Distributed Data Streams. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds) High Performance Computing and Communications. HPCC 2007. Lecture Notes in Computer Science, vol 4782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75444-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-75444-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75443-5
Online ISBN: 978-3-540-75444-2
eBook Packages: Computer ScienceComputer Science (R0)