ODRA: an outlier detection algorithm based on relevant attribute analysis method

Wahid, Abdul; Rao, Annavarapu Chandra Sekhara

doi:10.1007/s10586-020-03136-9

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Published: 13 June 2020

Volume 24, pages 569–585, (2021)
Cite this article

Cluster Computing Aims and scope Submit manuscript

689 Accesses
3 Citations
Explore all metrics

Abstract

Advances in data acquisition have generated an enormous amount of data that captures business, commercial, technological and scientific information. However, some occurrences are rare or unusual, irrespective of a large amount of data available. These rare occurrences in data mining are usually referred to as outliers or anomalies. All these rare occurrences are infrequent. Sometimes it varies from 0.01% to 10% depending on the type of application. In recent years, outlier detection has become important in many applications and has attracted considerable attention among the increasing number of data mining techniques. Focusing on this has resulted in several outlier detection algorithms, mostly based on distance or density. However, each method has its inherent weaknesses. Methods based on distance have problems with local density, and methods based on density have problems with low-density patterns. In this paper, we present a new outlier detection algorithm based on the relevant attribute analysis (ODRA) for local outlier detection in a high-dimensional dataset. There are two phases of the proposed algorithm. During the preliminary stage, we present a data reduction method that reduces the data set by pruning irrelevant attributes and data points. In the second phase, we propose an outlier detection method based on k-NN kernel density estimation. The experimental results on 15 UCI machine learning repository datasets show the supremacy and effectiveness of our proposed approach over state-of-the-art outlier detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

Unsupervised outlier detection in multidimensional data

Article Open access 02 June 2021

An effective information detection method for social big data

Article 19 December 2017

Notes

http://www.archive.ics.uci.edu/ml/

References

Aggarwal, C.C., Philip, S.Y.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)
Article Google Scholar
Aggarwal, C.C., Philip, S.: Outlier detection for high dimensional data. ACM Sigmod. Record. 10, 37–46 (2001)
Article Google Scholar
Barnett, V., Lewis, T., et al.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)
MATH Google Scholar
Bouguessa, M., Wang, S.: Mining projected clusters in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 21(4), 507–522 (2009)
Article Google Scholar
Breunig, M. M., Kriegel, H.-P., Ng, R. T., Sander, J.: Lof: identifying density-based local outliers. In ACM sigmod record, vol.29, pp. 93–104. ACM, (2000)
Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discov. 30(4), 891–927 (2016)
Article MathSciNet Google Scholar
Cheng, Z., Zou, C., Dong, J.: Outlier detection using isolation forest and local outlier factor. In: Proceedings of the conference on research in adaptive and convergent systems, pp. 161–168, (2019)
Craswell, N: R-precision, encyclopedia of database systems, (2009)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Article Google Scholar
Hawkins, D.M.: Identification of Outliers. Springer, New York (1980)
Book Google Scholar
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Pacific-Asia conference on knowledge discovery and data mining, pp. 577–593. Springer, (2006)
Keller, F., Muller, E., Bohm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: Data engineering (ICDE), 2012 IEEE 28th international conference on, pp. 1037–1048. IEEE, (2012)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management, pp. 1649–1652. ACM, (2009)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In Advances in knowledge discovery and data mining, pp. 831–838, (2009)
Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: Data mining (ICDM), 2012 IEEE 12th international conference on, pp. 379–388. IEEE, (2012)
Kriegel, H.-P., Zimek, A. et al.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 444–452. ACM, (2008)
Lichman, M.: UCI machine learning repository. irvine, ca: University of california, school of information and computer science. http://archive.ics.uci.edu/ml, (2013)
Müller, E., Schiffer, M., Seidl, T..: Statistical selection of relevant subspace projections for outlier ranking. In: Data engineering (ICDE), 2011 IEEE 27th international conference on, pp. 434–445. IEEE, (2011)
Pham, N., Pagh, R..: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 877–885. ACM, (2012)
Schubert, E., Zimek, A., Kriegel, H.-P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM international conference on data mining, pp. 542–550. SIAM, (2014)
Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014)
Article MathSciNet Google Scholar
Tang, J., Chen, Z., Fu, A. W.C., Cheung, D.: A robust outlier detection scheme for large data sets. In: In 6th Pacific-Asia conference on knowledge discovery and data mining. Citeseer, (2001)
Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia conference on knowledge discovery and data mining, pp 535–548. Springer, (2002)
Vázquez, F.I., Zseby, T., Zimek, A..: Outlier detection based on low density models. In: 2018 IEEE international conference on data mining workshops (ICDMW), pp. 970–979. IEEE, (2018)
Xie, J., Xiong, Z., Dai, Q., Wang, X., Zhang, Y.: A local-gravitation-based method for the detection of outliers and boundary points. Knowl. Based Syst. 192, 105331 (2020)
Article Google Scholar
Zhang, E., Zhang, Y..: Average precision. In Encyclopedia of Database Systems, pp. 192–193. Springer, (2009)
Zhang, J., Jiang, Y., Chang, K.H., Zhang, S., Cai, J., Hu, L.: A concept lattice based outlier mining method in low-dimensional subspaces. Pattern Recognit. Lett. 30(15), 1434–1439 (2009)
Article Google Scholar
Zhang, J., Zhang, S., Chang, K.H., Qin, X.: An outlier mining algorithm based on constrained concept lattice. Int. J. Syst. Sci. 45(5), 1170–1179 (2014)
Article MathSciNet Google Scholar
Zhao, X., Zhang, J., Qin, X.: Loma: a local outlier mining algorithm based on attribute relevance analysis. Expert Syst. Appl. 84, 272–280 (2017)
Article Google Scholar
Zhu, C., Kitagawa, H., Faloutsos, C..: Example-based robust outlier detection in high dimensional datasets. In: Data mining, fifth IEEE international conference on, pp. 4–pp. IEEE, (2005)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, Jharkhand, India
Abdul Wahid & Annavarapu Chandra Sekhara Rao

Authors

Abdul Wahid
View author publications
You can also search for this author in PubMed Google Scholar
Annavarapu Chandra Sekhara Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Wahid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wahid, A., Rao, A. ODRA: an outlier detection algorithm based on relevant attribute analysis method. Cluster Comput 24, 569–585 (2021). https://doi.org/10.1007/s10586-020-03136-9

Download citation

Received: 30 January 2020
Revised: 12 May 2020
Accepted: 25 May 2020
Published: 13 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10586-020-03136-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

Unsupervised outlier detection in multidimensional data

An effective information detection method for social big data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

Unsupervised outlier detection in multidimensional data

An effective information detection method for social big data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation