Finding centric local outliers in categorical/numerical spaces

Yu, Jeffrey Xu; Qian, Weining; Lu, Hongjun; Zhou, Aoying

doi:10.1007/s10115-005-0197-6

Finding centric local outliers in categorical/numerical spaces

Regular Paper
Published: 10 September 2005

Volume 9, pages 309–338, (2006)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jeffrey Xu Yu¹,
Weining Qian²,
Hongjun Lu³ &
…
Aoying Zhou²

365 Accesses
Explore all metrics

Abstract

Outlier detection techniques are widely used in many applications such as credit-card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to which an object is an outlier in a numerical space. In this paper, we propose a novel mutual-reinforcement-based local outlier detection approach. Instead of detecting local outliers as noise, we attempt to identify local outliers in the center, where they are similar to some clusters of objects on one hand, and are unique on the other. Our technique can be used for bank investment to identify a unique body, similar to many good competitors, in which to invest. We attempt to detect local outliers in categorical, ordinal as well as numerical data. In categorical data, the challenge is that there are many similar but different ways to specify relationships among the data items. Our mutual-reinforcement-based approach is stable, with similar but different user-defined relationships. Our technique can reduce the burden for users to determine the relationships among data items, and find the explanations why the outliers are found. We conducted extensive experimental studies using real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NeoLOD: A Novel Generalized Coupled Local Outlier Detection Model Embedded Non-IID Similarity Metric

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Article 13 June 2020

Homophily outlier detection in non-IID categorical data

Article 01 April 2021

References

Aggarwal C, Yu P (2001) Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 37–47
Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, New York
MATH Google Scholar
Breunig M, Kriegel H-P, Ng R, Sander J (1999) Optics-of: Identifying local outliers. In: Proccedings of the 3rd European conference on principles and practice of knowledge discovery in databases. Springer, Berlin Heidelberg New York, pp 262–270
Breunig M, Kriegel H-P, Ng R, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 93–104
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International conference on knowledge discovery and data mining. AAAI, Manlo Park, CA, pp 226–231
Guha S, Rastogi R, Shim K (1998) Cure: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 73–84
Guha S, Rastogi R, Shim K (1999) Rock: A robust clustering algorithm for categorical attributes. In: Proceedings of the IEEE international conference on data engineering. IEEE Computer Society, Morristown, NJ
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
MATH Google Scholar
Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 58–65
Jin W, Tung AK, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 293–298
Karypis G, Han E, Kumar V (1999) Chameleon: a hierarchical clustering algorithm using dynamic modeling. IEEE Computing 32(8):68–75
Google Scholar
Kleinberg J (1998) Authoritative sources in a hyperlinked environment In: Proceedings of the 9th ACM-SIAM symposium on discrete algorithms
MathSciNet Google Scholar
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 392–403
Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 211–222
Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 144–155
Preparata F, Shamos M (1988) Computational geometry: an introduction. Springer, Berlin Heidelberg New York
Google Scholar
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 427–438
Ruts I, Rousseeuw P (1996) Computing depth contours of bivariate point clouds. J Comput Stat Data Anal 23:153–168
Article MATH Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of 24th international conference on very large data bases. Morgan Kaufmann, San Mateo, Ca, pp 428–439
Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York
Tang J, Chen Z, Fu A W-C, Cheung D (2001) A robust outlier detection scheme for large data sets. Technical report. http://www.cs.panam.edu/ chen/paper-file/ outlierpaper.ps
Wang W, Yang J, Muntz R (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd international conference on very large data bases. Morgan Kaufmann, San Mateo, CA, pp 186–195
Zhang T, Ramakrishnan R, Linvy M (1996) Birch: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, pp 103–114

Download references

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Jeffrey Xu Yu
Fudan University, Shanghai, China
Weining Qian & Aoying Zhou
The Hong Kong University of Science and Technology, Hong Kong, China
Hongjun Lu

Authors

Jeffrey Xu Yu
View author publications
You can also search for this author inPubMed Google Scholar
Weining Qian
View author publications
You can also search for this author inPubMed Google Scholar
Hongjun Lu
View author publications
You can also search for this author inPubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jeffrey Xu Yu.

Additional information

Jeffrey Xu Yu received his B.E., M.E. and Ph.D. in computer science, from the University of Tsukuba, Japan, in 1985, 1987 and 1990, respectively. Jeffrey Xu Yu was a research fellow in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1990–Mar. 1991), and held teaching positions in the Institute of Information Sciences and Electronics, University of Tsukuba (Apr. 1991–July 1992) and in the Department of Computer Science, Australian National University (July 1992–June 2000). Currently he is an Associate Professor in the Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong. His major research interests include data mining, data stream mining/processing, XML query processing and optimization, data warehouse, on-line analytical processing, and design and implementation of database management systems.

Weining Qian is currently an assistant professor of computer science at Fudan University, Shanghai, China. He received his M.S. and Ph.D. degrees in computer science from Fudan University in 2001 and 2004, respectively. He was supported by a Microsoft Research Fellowship when he was doing the research presented in this paper, and he is supported by the Shanghai Rising Star Program. His research interests include data mining for very large databases, data stream query processing and mining and peer-to-peer computing.

Hongjun Lu received his B.Sc. from Tsinghua University, China, and M.Sc. and Ph.D. from the Department of Computer Science, University of Wisconsin–Madison. He worked as an engineer in the Chinese Academy of Space Technology, and a principal research scientist in the Computer Science Center of Honeywell Inc., Minnesota, USA (1985–1987), and a professor at the School of Computing of the National University of Singapore (1987–2000), and is a full professor of the Hong Kong University of Science and Technology. His research interests are in data/knowledge-base management systems with an emphasis on query processing and optimization, physical database design, and database performance. Hongjun Lu is currently a trustee of the VLDB Endowment, an associate editor of the IEEE Transactions on Knowledge and Data Engineering (TKDE), and a member of the review board of the Journal of Database Management. He served as a member of the ACM SIGMOD Advisory Board in 1998–2002.

Aoying Zhou born in 1965, is currently a professor of computer science at Fudan University, Shanghai, China. He won his Bachelor degree and Master degree in Computer Science from Sichuan University in Chengdu, Sichuan, China in 1985 and 1988. respectively, and a Ph.D. degree from Fudan University in 1993. He has served as a member or chair of the program committees for many international conferences such as VLDB, ER, DASFAA, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE and some international journals. His research interests include data mining and knowledge discovery, XML data management, web query and searching, data stream analysis and processing and peer-to-peer computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J.X., Qian, W., Lu, H. et al. Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9, 309–338 (2006). https://doi.org/10.1007/s10115-005-0197-6

Download citation

Received: 20 February 2004
Revised: 04 January 2005
Accepted: 08 January 2005
Published: 10 September 2005
Issue Date: March 2006
DOI: https://doi.org/10.1007/s10115-005-0197-6

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding centric local outliers in categorical/numerical spaces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

NeoLOD: A Novel Generalized Coupled Local Outlier Detection Model Embedded Non-IID Similarity Metric

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Homophily outlier detection in non-IID categorical data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now