Abstract
Outlier detection techniques are widely used in many applications such as credit-card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to which an object is an outlier in a numerical space. In this paper, we propose a novel mutual-reinforcement-based local outlier detection approach. Instead of detecting local outliers as noise, we attempt to identify local outliers in the center, where they are similar to some clusters of objects on one hand, and are unique on the other. Our technique can be used for bank investment to identify a unique body, similar to many good competitors, in which to invest. We attempt to detect local outliers in categorical, ordinal as well as numerical data. In categorical data, the challenge is that there are many similar but different ways to specify relationships among the data items. Our mutual-reinforcement-based approach is stable, with similar but different user-defined relationships. Our technique can reduce the burden for users to determine the relationships among data items, and find the explanations why the outliers are found. We conducted extensive experimental studies using real datasets.
Similar content being viewed by others
References
Aggarwal C, Yu P (2001) Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 37–47
Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York
Breunig M, Kriegel H-P, Ng R, Sander J (1999) Optics-of: Identifying local outliers. In: Proccedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer, Berlin Heidelberg New York, pp 262–270
Breunig M, Kriegel H-P, Ng R, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 93–104
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International conference on knowledge discovery and data mining. AAAI, Manlo Park, CA, pp 226–231
Guha S, Rastogi R, Shim K (1998) Cure: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 73–84
Guha S, Rastogi R, Shim K (1999) Rock: A robust clustering algorithm for categorical attributes. In: Proceedings of the IEEE international conference on data engineering. IEEE Computer Society, Morristown, NJ
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 58–65
Jin W, Tung AK, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 293–298
Karypis G, Han E, Kumar V (1999) Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computing 32(8):68–75
Kleinberg J (1998) Authoritative sources in a hyperlinked environment In: Proceedings of the 9th ACM-SIAM symposium on discrete algorithms
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 392–403
Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 211–222
Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 144–155
Preparata F, Shamos M (1988) Computational geometry: an introduction. Springer, Berlin Heidelberg New York
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 427–438
Ruts I, Rousseeuw P (1996) Computing depth contours of bivariate point clouds. J Comput Stat Data Anal 23:153–168
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, Ca, pp 428–439
Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York
Tang J, Chen Z, Fu A W-C, Cheung D (2001) A robust outlier detection scheme for large data sets. Technical report. http://www.cs.panam.edu/ chen/paper-file/ outlierpaper.ps
Wang W, Yang J, Muntz R (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 186–195
Zhang T, Ramakrishnan R, Linvy M (1996) Birch: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 103–114
Author information
Authors and Affiliations
Corresponding author
Additional information
Jeffrey Xu Yu received his B.E., M.E. and Ph.D. in computer science, from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Jeffrey Xu Yu was a research fellow in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1990–Mar. 1991), and held teaching positions in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1991–July 1992) and in the Department of Computer Science, Australian National University (July 1992–June 2000). Currently he is an Associate Professor in the Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong. His major research interests include data mining, data stream mining/processing, XML query processing and optimization, data warehouse, on-line analytical processing, and design and implementation of database management systems.
Weining Qian is currently an assistant professor of computer science at Fudan University, Shanghai, China. He received his M.S. and Ph.D. degrees in computer science from Fudan University in 2001 and 2004, respectively. He was supported by a Microsoft Research Fellowship when he was doing the research presented in this paper, and he is supported by the Shanghai Rising Star Program. His research interests include data mining for very large databases, data stream query processing and mining and peer-to-peer computing.
Hongjun Lu received his B.Sc. from Tsinghua University, China, and M.Sc. and Ph.D. from the Department of Computer Science, University of Wisconsin–Madison. He worked as an engineer in the Chinese Academy of Space Technology, and a principal research scientist in the Computer Science Center of Honeywell Inc., Minnesota, USA (1985–1987), and a professor at the School of Computing of the National University of Singapore (1987–2000), and is a full professor of the Hong Kong University of Science and Technology. His research interests are in data/knowledge-base management systems with an emphasis on query processing and optimization, physical database design, and database performance. Hongjun Lu is currently a trustee of the VLDB Endowment, an associate editor of the IEEE Transactions on Knowledge and Data Engineering (TKDE), and a member of the review board of the Journal of Database Management. He served as a member of the ACM SIGMOD Advisory Board in 1998–2002.
Aoying Zhou born in 1965, is currently a professor of computer science at Fudan University, Shanghai, China. He won his Bachelor degree and Master degree in Computer Science from Sichuan University in Chengdu, Sichuan, China in 1985 and 1988. respectively, and a Ph.D. degree from Fudan University in 1993. He has served as a member or chair of the program committees for many international conferences such as VLDB, ER, DASFAA, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE and some international journals. His research interests include data mining and knowledge discovery, XML data management, web query and searching, data stream analysis and processing and peer-to-peer computing.
Rights and permissions
About this article
Cite this article
Yu, J.X., Qian, W., Lu, H. et al. Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9, 309–338 (2006). https://doi.org/10.1007/s10115-005-0197-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0197-6