Abstract
Advances in data acquisition have generated an enormous amount of data that captures business, commercial, technological and scientific information. However, some occurrences are rare or unusual, irrespective of a large amount of data available. These rare occurrences in data mining are usually referred to as outliers or anomalies. All these rare occurrences are infrequent. Sometimes it varies from 0.01% to 10% depending on the type of application. In recent years, outlier detection has become important in many applications and has attracted considerable attention among the increasing number of data mining techniques. Focusing on this has resulted in several outlier detection algorithms, mostly based on distance or density. However, each method has its inherent weaknesses. Methods based on distance have problems with local density, and methods based on density have problems with low-density patterns. In this paper, we present a new outlier detection algorithm based on the relevant attribute analysis (ODRA) for local outlier detection in a high-dimensional dataset. There are two phases of the proposed algorithm. During the preliminary stage, we present a data reduction method that reduces the data set by pruning irrelevant attributes and data points. In the second phase, we propose an outlier detection method based on k-NN kernel density estimation. The experimental results on 15 UCI machine learning repository datasets show the supremacy and effectiveness of our proposed approach over state-of-the-art outlier detection methods.






Similar content being viewed by others
References
Aggarwal, C.C., Philip, S.Y.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)
Aggarwal, C.C., Philip, S.: Outlier detection for high dimensional data. ACM Sigmod. Record. 10, 37–46 (2001)
Barnett, V., Lewis, T., et al.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)
Bouguessa, M., Wang, S.: Mining projected clusters in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 21(4), 507–522 (2009)
Breunig, M. M., Kriegel, H.-P., Ng, R. T., Sander, J.: Lof: identifying density-based local outliers. In ACM sigmod record, vol.29, pp. 93–104. ACM, (2000)
Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discov. 30(4), 891–927 (2016)
Cheng, Z., Zou, C., Dong, J.: Outlier detection using isolation forest and local outlier factor. In: Proceedings of the conference on research in adaptive and convergent systems, pp. 161–168, (2019)
Craswell, N: R-precision, encyclopedia of database systems, (2009)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Hawkins, D.M.: Identification of Outliers. Springer, New York (1980)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Pacific-Asia conference on knowledge discovery and data mining, pp. 577–593. Springer, (2006)
Keller, F., Muller, E., Bohm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: Data engineering (ICDE), 2012 IEEE 28th international conference on, pp. 1037–1048. IEEE, (2012)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management, pp. 1649–1652. ACM, (2009)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In Advances in knowledge discovery and data mining, pp. 831–838, (2009)
Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: Data mining (ICDM), 2012 IEEE 12th international conference on, pp. 379–388. IEEE, (2012)
Kriegel, H.-P., Zimek, A. et al.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 444–452. ACM, (2008)
Lichman, M.: UCI machine learning repository. irvine, ca: University of california, school of information and computer science. http://archive.ics.uci.edu/ml, (2013)
Müller, E., Schiffer, M., Seidl, T..: Statistical selection of relevant subspace projections for outlier ranking. In: Data engineering (ICDE), 2011 IEEE 27th international conference on, pp. 434–445. IEEE, (2011)
Pham, N., Pagh, R..: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 877–885. ACM, (2012)
Schubert, E., Zimek, A., Kriegel, H.-P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM international conference on data mining, pp. 542–550. SIAM, (2014)
Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014)
Tang, J., Chen, Z., Fu, A. W.C., Cheung, D.: A robust outlier detection scheme for large data sets. In: In 6th Pacific-Asia conference on knowledge discovery and data mining. Citeseer, (2001)
Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia conference on knowledge discovery and data mining, pp 535–548. Springer, (2002)
Vázquez, F.I., Zseby, T., Zimek, A..: Outlier detection based on low density models. In: 2018 IEEE international conference on data mining workshops (ICDMW), pp. 970–979. IEEE, (2018)
Xie, J., Xiong, Z., Dai, Q., Wang, X., Zhang, Y.: A local-gravitation-based method for the detection of outliers and boundary points. Knowl. Based Syst. 192, 105331 (2020)
Zhang, E., Zhang, Y..: Average precision. In Encyclopedia of Database Systems, pp. 192–193. Springer, (2009)
Zhang, J., Jiang, Y., Chang, K.H., Zhang, S., Cai, J., Hu, L.: A concept lattice based outlier mining method in low-dimensional subspaces. Pattern Recognit. Lett. 30(15), 1434–1439 (2009)
Zhang, J., Zhang, S., Chang, K.H., Qin, X.: An outlier mining algorithm based on constrained concept lattice. Int. J. Syst. Sci. 45(5), 1170–1179 (2014)
Zhao, X., Zhang, J., Qin, X.: Loma: a local outlier mining algorithm based on attribute relevance analysis. Expert Syst. Appl. 84, 272–280 (2017)
Zhu, C., Kitagawa, H., Faloutsos, C..: Example-based robust outlier detection in high dimensional datasets. In: Data mining, fifth IEEE international conference on, pp. 4–pp. IEEE, (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wahid, A., Rao, A. ODRA: an outlier detection algorithm based on relevant attribute analysis method. Cluster Comput 24, 569–585 (2021). https://doi.org/10.1007/s10586-020-03136-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-020-03136-9