Abstract
Data mining methods for outlier detection are usually based on non-parametric density estimates in various variations. Here we argue for the use of local intrinsic dimensionality as a measure of outlierness and demonstrate empirically that it is a meaningful alternative and complement to classic methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amsaleg, L., Bailey, J., Barbe, D., Erfani, S.M., Houle, M.E., Nguyen, V., Radovanović, M.: The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality. In: WIFS 2017, pp. 1–6 (2017)
Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: Proceedings of KDD (2015)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of PKDD, pp. 15–26 (2002)
Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE TKDE 17(2), 203–215 (2005)
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Hoboken (1994)
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbors. In: Proceedings of ICML, pp. 97–104 (2006)
Breunig, M.M., Kriegel, H.P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of SIGMOD, pp. 93–104 (2000)
Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE TPAMI 24(10), 1404–1407 (2002)
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM TKDD 10(1), 5:1–5:51 (2015)
Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30, 891–927 (2016)
Casanova, G., Englmeier, E., Houle, M., Kroeger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)
Costa, J.A., Hero, A.O.: Entropic graphs for manifold learning. In: 37th Asilomar Conference on Signals, Systems, and Computers, vol. 1, pp. 316–320 (2003)
de Vries, T., Chawla, S., Houle, M.E.: Density-preserving projections for large-scale local anomaly detection. KAIS 32(1), 25–52 (2012)
Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2003)
Fraga Alves, M., de Haan, L., Lin, T.: Estimation of the parameter controlling the speed of convergence in extreme value theory. Math. Methods Stat. 12(2), 155–176 (2003)
Grassberger, P., Procaccia, I.: Characterization of strange attractors. Phys. Rev. Lett. 50, 346–349 (1983)
Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)
Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: Proceedings of FOCS, pp. 534–543 (2003)
Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbor graph. In: Proceedings of ICPR, pp. 430–433 (2004)
Hawkins, D.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)
Hein, M., Audibert, J.Y.: Intrinsic dimensionality estimation of submanifolds in \(R^d\). In: Proceedings of ICML, pp. 289–296 (2005)
Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)
Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Proceedings of ICDM Workshops, pp. 468–473 (2013)
Houle, M.E.: Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 64–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_5
Houle, M.E.: Local intrinsic dimensionality II: multivariate analysis and distributional support. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 80–95. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_6
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: ICDM Workshop PTDM, pp. 587–594 (2012)
Houle, M.E., Ma, X., Nett, M., Oria, V.: Dimensional testing for multi-step similarity search. In: Proceedings of ICDM, pp. 299–308 (2012)
Houle, M.E., Ma, X., Oria, V.: Effective and efficient algorithms for flexible aggregate similarity search in high dimensional spaces. IEEE TKDE 27(12), 3258–3273 (2015)
Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in axis-aligned subspaces. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. Lecture Notes in Computer Science, vol. 8821, pp. 1–12. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_1
Houle, M.E., Ma, X., Oria, V., Sun, J.: Query expansion for content-based similarity search using local and global features. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 13(3), 1–23 (2017)
Houle, M.E., Oria, V., Wali, A.M.: Improving \(k\)-nn graph accuracy using local intrinsic dimensionality. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 110–124. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_8
Houle, M.E., Nett, M.: Rank-based similarity search: reducing the dimensional dependence. IEEE TPAMI 37(1), 136–150 (2015)
Huisman, R., Koedijk, K.G., Kool, C.J.M., Palm, F.: Tail-index estimates in small samples. J. Bus. Econ. Stat. 19(2), 208–216 (2001)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002). https://doi.org/10.1007/b98835
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: Proceedings of STOC, pp. 741–750 (2002)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of VLDB, pp. 392–403 (1998)
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of SDM, pp. 13–24 (2011)
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of KDD, pp. 444–452 (2008)
Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, vol. 2. Springer, New York (2002). https://doi.org/10.1007/978-1-4615-1539-5
Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier Detection with Kernel Density Functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_6
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Proceedings of NIPS, pp. 777–784 (2004)
Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S.N.R., Schoenebeck, G., Song, D., Houle, M.E., Bailey, J.: Characterizing adversarial subspaces using local intrinsic dimensionality, pp. 1–15 (2018)
Ma, X., Wang, Y., Houle, M.E., Zhou, S., Erfani, S.M., Xia, S., Wijewickrema, S.N.R., Bailey, J.: Dimensionality-driven learning with noisy labels, pp. 1–10 (2018)
Navarro, G., Paredes, R., Reyes, N., Bustos, C.: An empirical evaluation of intrinsic dimension estimators. Inf. Syst. 64, 206–218 (2017)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of ICDE, pp. 315–326 (2003)
Pei, Y., Zaïane, O., Gao, Y.: An efficient reference-based approach to outlier detection in large datasets. In: Proceedings of ICDM, pp. 478–487 (2006)
Radovanović, M., Nanopoulos, A., Ivanović, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE TKDE 27, 1369–1382 (2015)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of SIGMOD, pp. 427–438 (2000)
Romano, S., Chelly, O., Nguyen, V., Bailey, J., Houle, M.E.: Measuring dependency via intrinsic dimensionality, pp. 1207–1212 (2016)
Rousseeuw, P.J., Hubert, M.: Robust statistics for outlier detection. WIREs DMKD 1(1), 73–79 (2011)
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E., Campadelli, P.: Novel high intrinsic dimensionality estimators. Mach. Learn. 89(1–2), 37–65 (2012)
Schubert, E., Gertz, M.: Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In: Proceedings of SISAP, pp. 188–203 (2017)
Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)
Schubert, E., Zimek, A., Kriegel, H.P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of SDM, pp. 542–550 (2014)
Schubert, E., Zimek, A., Kriegel, H.P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014)
Schubert, E., Zimek, A., Kriegel, H.-P.: Fast and scalable outlier detection with approximate nearest neighbor ensembles. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9050, pp. 19–36. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18123-3_2
Takens, F.: On the numerical determination of the dimension of an attractor. In: Braaksma, B.L.J., Broer, H.W., Takens, F. (eds.) Dynamical Systems and Bifurcations. LNM, vol. 1125, pp. 99–106. Springer, Heidelberg (1985). https://doi.org/10.1007/BFb0075637
Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53
Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: a ranking driven approach. In: Proceedings of ICDE, pp. 410–421 (2011)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Zimek, A., Campello, R.J.G.B., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions. SIGKDD Explor. 15(1), 11–22 (2013)
Acknowledgments
M. E. Houle supported by JSPS Kakenhi Kiban (B) Research Grant 18H03296.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Houle, M.E., Schubert, E., Zimek, A. (2018). On the Correlation Between Local Intrinsic Dimensionality and Outlierness. In: Marchand-Maillet, S., Silva, Y., Chávez, E. (eds) Similarity Search and Applications. SISAP 2018. Lecture Notes in Computer Science(), vol 11223. Springer, Cham. https://doi.org/10.1007/978-3-030-02224-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-02224-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02223-5
Online ISBN: 978-3-030-02224-2
eBook Packages: Computer ScienceComputer Science (R0)