An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data

Nguyen, Hoang Vu; Gopalkrishnan, Vivekanand; Assent, Ira

doi:10.1007/978-3-642-20149-3_12

Hoang Vu Nguyen¹⁹,
Vivekanand Gopalkrishnan¹⁹ &
Ira Assent²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6587))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1524 Accesses

Abstract

Traditional outlier detection techniques usually fail to work efficiently on high-dimensional data due to the curse of dimensionality. This work proposes a novel method for subspace outlier detection, that specifically deals with multidimensional spaces where feature relevance is a local rather than a global property. Different from existing approaches, it is not grid-based and dimensionality unbiased. Thus, its performance is impervious to grid resolution as well as the curse of dimensionality. In addition, our approach ranks the outliers, allowing users to select the number of desired outliers, thus mitigating the issue of high false alarm rate. Extensive empirical studies on real datasets show that our approach efficiently and effectively detects outliers, even in high-dimensional spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A novel subspace outlier detection method by entropy-based clustering algorithm

Article Open access 15 September 2023

Detecting and ranking outliers in high-dimensional data

Article 14 December 2018

Hiding outliers in high-dimensional data spaces

Article 12 September 2017

References

Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)
Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)
Google Scholar
Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J 14(2), 211–221 (2005)
Article Google Scholar
Aggarwal, C.C., Yu, P.S.: Outlier detection with uncertain data. In: SDM, pp. 483–493 (2008)
Google Scholar
Ye, M., Li, X., Orlowska, M.E.: Projected outlier detection in high-dimensional mixed-attributes data set. Expert Syst. Appl. 36(3), 7104–7113 (2009)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Google Scholar
Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 17(2), 203–215 (2005)
Article MATH Google Scholar
Müller, E., Assent, I., Steinhausen, U., Seidl, T.: OutRank: ranking outliers in high dimensional data. In: ICDE Workshops, pp. 600–603 (2008)
Google Scholar
Nguyen, H.V., Ang, H.H., Gopalkrishnan, V.: Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010, Part I. LNCS, vol. 5981, pp. 368–383. Springer, Heidelberg (2010)
Chapter Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38 (2003)
Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: Dimensionality unbiased subspace clustering. In: ICDM, pp. 409–414 (2007)
Google Scholar
Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: KDD, pp. 394–403 (2006)
Google Scholar
Ailon, N., Chazelle, B.: Faster dimension reduction. Commun. CACM 53(2), 97–104 (2010)
Article Google Scholar
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15(5), 1170–1187 (2003)
Article Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore
Hoang Vu Nguyen & Vivekanand Gopalkrishnan
Department of Computer Science, Aarhus University, Denmark
Ira Assent

Authors

Hoang Vu Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Vivekanand Gopalkrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Ira Assent
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong,, Shatin, N.T., Hong Kong, China
Jeffrey Xu Yu
Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro (373-1 Guseong-don), Yuseong-gu, 305-701, Daejeon, Korea
Myoung Ho Kim
Institute for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen, Schützenbahn 70, 45117, Essen, Germany
Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, H.V., Gopalkrishnan, V., Assent, I. (2011). An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20149-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-20149-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20148-6
Online ISBN: 978-3-642-20149-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics