Abstract
In this paper, we study the problem of projected outlier detection in high dimensional data streams and propose a new technique, called Stream Projected Ouliter deTector (SPOT), to identify outliers embedded in subspaces. Sparse Subspace Template (SST), a set of subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effectively. Multi-Objective Genetic Algorithm (MOGA) is employed as an effective search method for finding outlying subspaces from training data to construct SST. SST is able to carry out online self-evolution in the detection stage to cope with dynamics of data streams. The experimental results demonstrate the efficiency and effectiveness of SPOT in detecting outliers in high-dimensional data streams.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB Journal 14, 211–221 (2005)
Aggarwal, C.C.: On Abnormality Detection in Spuriously Populated Data Streams. In: SDM 2005, Newport Beach, CA (2005)
Aggarwal, C.C., Yu, P.S.: Outlier Detection in High Dimensional Data. In: SIGMOD 2001, Santa Barbara, California, USA, pp. 37–46 (2001)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: VLDB 2003, Berlin, Germany, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Projected Clustering of High Dimensional Data Streams. In: VLDB 2004, Toronto, Canada, pp. 852–863 (2004)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–26. Springer, Heidelberg (2002)
Breuning, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD 2000, Dallas, Texas, pp. 93–104 (2000)
Guttman, A.: R-trees: a Dynamic Index Structure for Spatial Searching. In: SIGMOD 1984, Boston, Massachusetts, pp. 47–57 (1984)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, San Francisco (2000)
Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-based Outliers in Large Dataset. In: VLDB 1998, New York, NY, pp. 392–403 (1998)
Knorr, E.M., Ng, R.T.: Finding Intentional Knowledge of Distance-based Outliers. In: VLDB 1999, Edinburgh, Scotland, pp. 211–222 (1999)
Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Distributed deviation detection in sensor networks. SIGMOD Record 32(4), 77–82 (2003)
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000, Dallas Texas, pp. 427–438 (2000)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE 2003, Bangalore, India, p. 315 (2003)
Pokrajac, D., Lazarevic, A., Latecki, L.: Incremental Local Outlier Detection for Data Streams. In: CIDM 2007, Honolulu, Hawaii, USA, pp. 504–515 (2007)
Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online Outlier Detection in Sensor Data Using Non-Parametric Models. In: VLDB 2006, Seoul, Korea, pp. 187–198 (2006)
Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, p. 535. Springer, Heidelberg (2002)
Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-Miner: A System for Detecting Outlying Subspaces of High-dimensional Data. In: VLDB 2004, Toronto, Canada, pp. 1265–1268 (2004)
Zhang, J., Gao, Q., Wang, H.: A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm. In: ICDM 2006, Hong Kong, China, pp. 731–740 (2006)
Zhang, J., Wang, H.: Detecting Outlying Subspaces for High-dimensional Data: the New Task, Algorithms and Performance. In: Knowledge and Information Systems (KAIS), pp. 333–355 (2006)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996, Montreal, Canada, pp. 103–114 (1996)
Zhu, C., Kitagawa, H., Faloutsos, C.: Example-Based Robust Outlier Detection in High Dimensional Datasets. In: ICDM 2005, Houston, Texas, pp. 829–832 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, J., Gao, Q., Wang, H., Liu, Q., Xu, K. (2009). Detecting Projected Outliers in High-Dimensional Data Streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2009. Lecture Notes in Computer Science, vol 5690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03573-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-03573-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03572-2
Online ISBN: 978-3-642-03573-9
eBook Packages: Computer ScienceComputer Science (R0)