Abstract
Stream data are often transmitted over a distributed network, but in many cases, are too voluminous to be collected in a central location. Instead, we must perform distributed computations, guaranteeing high quality results in real-time even as new data arrive. In this paper, firstly, we formalize the problem of continuous outlier detection over distributed evolving data streams. Then, two novel outlier measures and algorithms are proposed which can identify outliers in a single pass. Furthermore, our experiments with synthetic and real data show that the proposed methods are both efficient and effective compared with existing outlier detection algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, V., Lewis, T.: Outliers in statistical data, 3rd edn. Wiley, Chichester (2001)
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: ICML (2000)
Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD (1995)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: SIGMOD (1996)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB (1998)
Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: KDD (2006)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: SIGMOD (2000)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: ICDE (2003)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: PAKDD (2006)
Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: ICDE (2005)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: ICDE (2007)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, Chichester (2001)
Gijbels, I., Pope, A., Wand, M.: Automatic forecasting via exponential smoothing: Asymptotic properties (1997)
Ali, S.M., Silvey, S.D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. Ser B 28, 131–142 (1966)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, L., Han, W., Zou, P., Jia, Y. (2007). Continuous Kernel-Based Outlier Detection over Distributed Data Streams. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds) Frontiers of High Performance Computing and Networking ISPA 2007 Workshops. ISPA 2007. Lecture Notes in Computer Science, vol 4743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74767-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-74767-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74766-6
Online ISBN: 978-3-540-74767-3
eBook Packages: Computer ScienceComputer Science (R0)