Mining Outliers with Faster Cutoff Update and Space Utilization

Szeto, Chi-Cheong; Hung, Edward

doi:10.1007/978-3-642-01307-2_85

Chi-Cheong Szeto²³ &
Edward Hung²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3125 Accesses
3 Citations

Abstract

It is desirable to find unusual data objects by Ramaswamy et al’s distance-based outlier definition because only a metric distance function between two objects is required. It does not need any neighborhood distance threshold required by many existing algorithms based on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4 to 2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hawkins, D.M.: Identification of outliers. Chapman and Hall, Boca Raton (1980)
Book MATH Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB 1998: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403. Morgan Kaumann Publishers Inc., San Francisco (1998)
Google Scholar
Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: KDD 2006: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 394–403. ACM, New York (2006)
Google Scholar
Angiulli, F., Fassetti, F.: Very efficient mining of distance-based outliers. In: CIKM 2007: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 791–800. ACM, New York (2007)
Chapter Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. 29(2), 427–438 (2000)
Article Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 29–38. ACM, New York (2003)
Google Scholar
Ghoting, A., Parthasarathy, S., Otey, M.E.: Fast mining of distance-based outliers in high-dimensional datasets. Data Min. Knowl. Discov. 16(3), 349–364 (2008)
Article MathSciNet Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD archive (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Hong Kong Polytechnic University, Hong Kong
Chi-Cheong Szeto & Edward Hung

Authors

Chi-Cheong Szeto
View author publications
You can also search for this author in PubMed Google Scholar
Edward Hung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5 Tiwanont Road, 12000, Bangkadi, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 10330, Bangkok, Thailand
Boonserm Kijsirikul
Faculty of Science & Engineering, York University, 355 Lumbers Building, 4700 Keele Street, M3J 1P3, Toronto, Ontario, Canada
Nick Cercone
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Tu-Bao Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szeto, CC., Hung, E. (2009). Mining Outliers with Faster Cutoff Update and Space Utilization. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_85

Download citation

DOI: https://doi.org/10.1007/978-3-642-01307-2_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics