Outliers Detection in Multi-label Datasets

Bello, Marilyn; Nápoles, Gonzalo; Morera, Rafael; Vanhoof, Koen; Bello, Rafael

doi:10.1007/978-3-030-60884-2_5

Marilyn Bello^12,13,
Gonzalo Nápoles^13,14,
Rafael Morera¹²,
Koen Vanhoof¹³ &
…
Rafael Bello¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12468))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

810 Accesses

Abstract

In many knowledge discovery applications, finding outliers, i.e. objects that behave in an unexpected way or have abnormal properties, is more interesting than finding inliers in a dataset. Outlier detection is important for many applications, including those related to intrusion detection, credit card fraud, and criminal activity in e-commerce. Several methods of outlier detection have been proposed, and even many of them from the perspective of Rough Set Theory, but at the moment none of them is specifically intended for multi-label datasets. In this paper, we propose a method that measures the degree of anomaly of an object in a multi-label dataset. This score or measure quantifies the degree of irregularity of an object with respect to the dataset. In addition, a method for generating anomalies in this type of datasets is proposed. From these synthetic datasets, the efficacy of the proposed method is proved. The results show the superiority of our proposal over other methods in the literature adapted to multi-label problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Article 13 June 2020

Outlier detection for incomplete real-valued data via information entropy and class-consistent technology

Article 18 April 2024

A new supervised outlier detection method for hybrid data

Article 27 December 2024

References

Acuña, E., Rodriguez, C.: On Detection of Outliers and Their Effect in Supervised Classification, vol. 15. University of Puerto Rico at Mayaguez (2004)
Google Scholar
Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Chapter Google Scholar
Barnet, V., Lewis, T.: Outliers in Statistical Data (1994)
Google Scholar
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
Bookstein, A., Kulyukin, V.A., Raita, T.: Generalized hamming distance. Inf. Retrieval 5(4), 353–375 (2002)
Article Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Charte, F., Charte, D., Rivera, A., del Jesus, M.J., Herrera, F.: R ultimate multilabel dataset repository. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 487–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_41
Chapter Google Scholar
Chen, Y., Miao, D., Zhang, H.: Neighborhood outlier detection. Expert Syst. Appl. 37(12), 8745–8749 (2010)
Article Google Scholar
Gebhardt, J., Goldstein, M., Shafait, F., Dengel, A.: Document authentication using printing technique features and unsupervised anomaly detection. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 479–483. IEEE (2013)
Google Scholar
Hawkins, D.M.: Identification of Outliers, vol. 11. Springer, Netherlands (1980). https://doi.org/10.1007/978-94-015-3994-4
Book MATH Google Scholar
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel classification. Multilabel Classification, pp. 17–31. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41111-8_2
Chapter Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Jiang, F., Chen, Y.-M.: Outlier detection based on granular computing and rough set theory. Appl. Intell. 42(2), 303–322 (2014). https://doi.org/10.1007/s10489-014-0591-4
Article MathSciNet Google Scholar
Jiang, F., Sui, Y., Cao, C.: Outlier detection using rough set theory. In: Ślęzak, D., Yao, J.T., Peters, J.F., Ziarko, W., Hu, X. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3642, pp. 79–87. Springer, Heidelberg (2005). https://doi.org/10.1007/11548706_9
Chapter Google Scholar
Jiang, F., Sui, Y., Cao, C.: A rough set approach to outlier detection. Int. J. Gener. Syst. 37(5), 519–536 (2008)
Article Google Scholar
Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: KDD, pp. 224–228. Citeseer (1998)
Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8(3–4), 237–253 (2000)
Article Google Scholar
Kovács, L., Vass, D., Vidács, A.: Improving quality of service parameter prediction with preliminary outlier detection and elimination. In: Proceedings of the Second International Workshop on Inter-domain Performance and Simulation (IPS 2004), Budapest, vol. 2004, pp. 194–199 (2004)
Google Scholar
Lundin, E., Kvarnström, H., Jonsson, E.: A synthetic fraud data generation methodology. In: Deng, R., Bao, F., Zhou, J., Qing, S. (eds.) ICICS 2002. LNCS, vol. 2513, pp. 265–277. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36159-6_23
Chapter Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Article Google Scholar
Pereira, R.B., Plastino, A., Zadrozny, B., Merschmann, L.H.: Correlation analysis of performance measures for multi-label classification. Inf. Process. Manage. 54(3), 359–369 (2018)
Article Google Scholar
Porwal, U., Mukund, S.: Credit card fraud detection in e-commerce: an outlier detection approach. arXiv preprint arXiv:1811.02196 (2018)
Ramakrishnan, J., Shaabani, E., Li, C., Sustik, M.A.: Anomaly detection for an e-commerce pricing system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1917–1926 (2019)
Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection, vol. 589. Wiley, New York (2005)
MATH Google Scholar
Shaari, F., Bakar, A.A., Hamdan, A.R.: Outlier detection based on rough sets theory. Intell. Data Anal. 13(2), 191–206 (2009)
Article Google Scholar
Slowinski, R., Vanderpooten, D.: A generalized definition of rough approximations based on similarity. IEEE Trans. Knowl. Data Eng. 12(2), 331–336 (2000)
Article Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a java library for multi-label learning. J. Mach. Learn. Res. 12(Jul), 2411–2414 (2011)
MathSciNet MATH Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Chapter Google Scholar
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Article MathSciNet Google Scholar
Zhang, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidad Central de Las Villas, Santa Clara, Cuba
Marilyn Bello, Rafael Morera & Rafael Bello
Faculty of Business Economics, Hasselt University, Hasselt, Belgium
Marilyn Bello, Gonzalo Nápoles & Koen Vanhoof
Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands
Gonzalo Nápoles

Authors

Marilyn Bello
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Nápoles
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Morera
View author publications
You can also search for this author in PubMed Google Scholar
Koen Vanhoof
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marilyn Bello .

Editor information

Editors and Affiliations

Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
Lourdes Martínez-Villaseñor
Universidad Autónoma Metropolitana, Mexico City, Mexico
Oscar Herrera-Alcántara
Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
Hiram Ponce
Universidad Autónoma del Estado de Hidalgo, Hidalgo, Mexico
Félix A. Castro-Espinoza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bello, M., Nápoles, G., Morera, R., Vanhoof, K., Bello, R. (2020). Outliers Detection in Multi-label Datasets. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Soft Computing. MICAI 2020. Lecture Notes in Computer Science(), vol 12468. Springer, Cham. https://doi.org/10.1007/978-3-030-60884-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-60884-2_5
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60883-5
Online ISBN: 978-3-030-60884-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Outliers Detection in Multi-label Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Outlier detection for incomplete real-valued data via information entropy and class-consistent technology

A new supervised outlier detection method for hybrid data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Outliers Detection in Multi-label Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ODRA: an outlier detection algorithm based on relevant attribute analysis method

Outlier detection for incomplete real-valued data via information entropy and class-consistent technology

A new supervised outlier detection method for hybrid data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation