Skip to main content

An Improved KNN Based Outlier Detection Algorithm for Large Datasets

  • Conference paper
Advanced Data Mining and Applications (ADMA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6440))

Included in the following conference series:

Abstract

Outlier detection is becoming a hot issue in the field of data mining since outliers often contain useful information. In this paper, we propose an improved KNN based outlier detection algorithm which is fulfilled through two stage clustering. Clustering one is to partition the dataset into several clusters and then calculate the Kth nearest neighbor in each cluster which can effectively avoid passing the entire dataset for each calculation. Clustering two is to partition the clusters obtained by clustering one and then prune the partitions as soon as it is determined that it cannot contain outliers which results in substantial savings in computation. Experimental results on both synthetic and real life datasets demonstrate that our algorithm is efficient in large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–104. ACM Press, New York (2000)

    Google Scholar 

  2. Birant, D., Kut, A.: Spatio-temporal outlier detection in large databases. In: Proceedings of Conf. Information Technology Interfaces, pp. 179–184 (2003)

    Google Scholar 

  3. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)

    MATH  Google Scholar 

  4. Knorr, E., Ng, R.: Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the 24th Conference on VLDB, New York, pp. 392–403 (1998)

    Google Scholar 

  5. Johnson, T., Kwok, I., Ng, R.: Fast Computation of 2-Dimensional Depth Contours. In: Proceedings of 4th. Int. Conf. on KDD, New York, pp. 224–228 (1998)

    Google Scholar 

  6. Ruts, I., Rousseeuw, P.: Computing Depth Contours of Bivariate Point Clouds. Journal of Computational Statistics and Data Analysis (23), 153–168 (1996)

    Google Scholar 

  7. Breunig, M.M., Kriegel, H.P., Ng, R.T.: LOF: Identifying density based local outliers. In: Proceedings of ACM Conference, pp. 93–104 (2000)

    Google Scholar 

  8. Jain, A., Murty, M., Flynn, P.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  9. Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, pp. 144–155 (1994)

    Google Scholar 

  10. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Clustering for Mining in Large Spatial Databases. KI-Journal (Artificial Intelligence), Special Issue on Data Mining 12(1), 18–24 (1998)

    Google Scholar 

  11. Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithms for Large Databases. In: Proceedings of ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, pp. 73–84 (1998)

    Google Scholar 

  12. Yang, P., Huang, B.: An efficient outlier mining algorithm for large dataset. In: Proceedings of the International Conference on Information Management, Innovation Management and Industrial Engineering, vol. 1, pp. 199–202 (2008)

    Google Scholar 

  13. Zhang, T., Ramakrishnan, R., Birch, M.L.: An efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–114 (June 1996)

    Google Scholar 

  14. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Q., Zheng, M. (2010). An Improved KNN Based Outlier Detection Algorithm for Large Datasets. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17316-5_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17315-8

  • Online ISBN: 978-3-642-17316-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics