Abstract
Traditional outlier detection techniques usually fail to work efficiently on high-dimensional data due to the curse of dimensionality. This work proposes a novel method for subspace outlier detection, that specifically deals with multidimensional spaces where feature relevance is a local rather than a global property. Different from existing approaches, it is not grid-based and dimensionality unbiased. Thus, its performance is impervious to grid resolution as well as the curse of dimensionality. In addition, our approach ranks the outliers, allowing users to select the number of desired outliers, thus mitigating the issue of high false alarm rate. Extensive empirical studies on real datasets show that our approach efficiently and effectively detects outliers, even in high-dimensional spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)
Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J 14(2), 211–221 (2005)
Aggarwal, C.C., Yu, P.S.: Outlier detection with uncertain data. In: SDM, pp. 483–493 (2008)
Ye, M., Li, X., Orlowska, M.E.: Projected outlier detection in high-dimensional mixed-attributes data set. Expert Syst. Appl. 36(3), 7104–7113 (2009)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 17(2), 203–215 (2005)
Müller, E., Assent, I., Steinhausen, U., Seidl, T.: OutRank: ranking outliers in high dimensional data. In: ICDE Workshops, pp. 600–603 (2008)
Nguyen, H.V., Ang, H.H., Gopalkrishnan, V.: Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010, Part I. LNCS, vol. 5981, pp. 368–383. Springer, Heidelberg (2010)
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38 (2003)
Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: Dimensionality unbiased subspace clustering. In: ICDM, pp. 409–414 (2007)
Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: KDD, pp. 394–403 (2006)
Ailon, N., Chazelle, B.: Faster dimension reduction. Commun. CACM 53(2), 97–104 (2010)
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15(5), 1170–1187 (2003)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, H.V., Gopalkrishnan, V., Assent, I. (2011). An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20149-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-20149-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20148-6
Online ISBN: 978-3-642-20149-3
eBook Packages: Computer ScienceComputer Science (R0)