An Improved KNN Based Outlier Detection Algorithm for Large Datasets

Wang, Qian; Zheng, Min

doi:10.1007/978-3-642-17316-5_56

Qian Wang²² &
Min Zheng²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6440))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2457 Accesses
3 Citations

Abstract

Outlier detection is becoming a hot issue in the field of data mining since outliers often contain useful information. In this paper, we propose an improved KNN based outlier detection algorithm which is fulfilled through two stage clustering. Clustering one is to partition the dataset into several clusters and then calculate the Kth nearest neighbor in each cluster which can effectively avoid passing the entire dataset for each calculation. Clustering two is to partition the clusters obtained by clustering one and then prune the partitions as soon as it is determined that it cannot contain outliers which results in substantial savings in computation. Experimental results on both synthetic and real life datasets demonstrate that our algorithm is efficient in large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–104. ACM Press, New York (2000)
Google Scholar
Birant, D., Kut, A.: Spatio-temporal outlier detection in large databases. In: Proceedings of Conf. Information Technology Interfaces, pp. 179–184 (2003)
Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)
MATH Google Scholar
Knorr, E., Ng, R.: Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the 24th Conference on VLDB, New York, pp. 392–403 (1998)
Google Scholar
Johnson, T., Kwok, I., Ng, R.: Fast Computation of 2-Dimensional Depth Contours. In: Proceedings of 4th. Int. Conf. on KDD, New York, pp. 224–228 (1998)
Google Scholar
Ruts, I., Rousseeuw, P.: Computing Depth Contours of Bivariate Point Clouds. Journal of Computational Statistics and Data Analysis (23), 153–168 (1996)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T.: LOF: Identifying density based local outliers. In: Proceedings of ACM Conference, pp. 93–104 (2000)
Google Scholar
Jain, A., Murty, M., Flynn, P.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, pp. 144–155 (1994)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Clustering for Mining in Large Spatial Databases. KI-Journal (Artificial Intelligence), Special Issue on Data Mining 12(1), 18–24 (1998)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithms for Large Databases. In: Proceedings of ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, pp. 73–84 (1998)
Google Scholar
Yang, P., Huang, B.: An efficient outlier mining algorithm for large dataset. In: Proceedings of the International Conference on Information Management, Innovation Management and Industrial Engineering, vol. 1, pp. 199–202 (2008)
Google Scholar
Zhang, T., Ramakrishnan, R., Birch, M.L.: An efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–114 (June 1996)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Chongqing University, Chongqing, China
Qian Wang & Min Zheng

Authors

Qian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, University of Technology Sydney, 2007, Sydney, NSW, Australia
Longbing Cao
College of Computer Science, Chongqing University, 400030, Chongqing, China
Yong Feng
College of Computer Science, Chongqing University , 400030, Chongqing, China
Jiang Zhong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q., Zheng, M. (2010). An Improved KNN Based Outlier Detection Algorithm for Large Datasets. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_56

Download citation

DOI: https://doi.org/10.1007/978-3-642-17316-5_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17315-8
Online ISBN: 978-3-642-17316-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics