A Complexity Measure for Binary Classification Problems Based on Lost Points

Lancho, Carmen; Martín de Diego, Isaac; Cuesta, Marina; Aceña, Víctor; M. Moguerza, Javier

doi:10.1007/978-3-030-91608-4_14

Carmen Lancho¹⁷,
Isaac Martín de Diego¹⁷,
Marina Cuesta¹⁷,
Víctor Aceña^17,18 &
…
Javier M. Moguerza¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13113))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1538 Accesses
1 Citations
6 Altmetric

Abstract

Complexity measures are focused on exploring and capturing the complexity of a data set. In this paper, the Lost points (LP) complexity measure is proposed. It is obtained by applying k-means in a recursive and hierarchical way and it provides both the data set and the instance perspective. On the instance level, the LP measure gives a probability value for each point informing about the dominance of its class in its neighborhood. On the data set level, it estimates the proportion of lost points, referring to those points that are expected to be misclassified since they lie in areas where its class is not dominant. The proposed measure shows easily interpretable results competitive with measures from state-of-art. In addition, it provides probabilistic information useful to highlight the boundary decision on classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Algar, M.J., et al.: A quality of experience management framework for mobile users. Wirel. Commun. Mob. Comput. 2019, 11 (2019). https://doi.org/10.1155/2019/2352941. Article ID 2352941
Arruda, J.L.M., Prudêncio, R.B.C., Lorena, A.C.: Measuring instance hardness using data complexity measures. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12320, pp. 483–497. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61380-8_33
Chapter Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Garcia, L., Lorena, A.: ECoL: Complexity Measures for Supervised Problems (2019). https://CRAN.R-project.org/package=ECoL, r package version 0.3.0
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surveys (CSUR) 52(5), 1–34 (2019)
Article Google Scholar
Oh, S.: A new dataset evaluation method based on category overlap. Comput. Biol. Med. 41(2), 115–122 (2011)
Article Google Scholar
Singh, S.: Prism-a novel framework for pattern recognition. Patt. Anal. Appl. 6(2), 134–149 (2003)
Article MathSciNet Google Scholar
Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2013). https://doi.org/10.1007/s10994-013-5422-z
Article MathSciNet MATH Google Scholar
Wan, S., Zhao, Y., Wang, T., Gu, Z., Abbasi, Q.H., Choo, K.K.R.: Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things. Futur. Gener. Comput. Syst. 91, 382–391 (2019)
Article Google Scholar
Weitzman, M.S.: Measures of overlap of income distributions of white and Negro families in the United States, vol. 22. US Bureau of the Census (1970)
Google Scholar

Download references

Acknowledgements

This research has been supported by grants from Rey Juan Carlos University (Ref: C1PREDOC2020), Madrid Autonomous Community (Ref: IND2019/TIC-17194) and the Spanish Ministry of Economy and Competitiveness, under the Retos-Investigación program: MODAS-IN (Ref: RTI-2018-094269-B-I00).

Author information

Authors and Affiliations

Data Science Laboratory, Rey Juan Carlos University, C/Tulipán, s/n, 28933, Móstoles, Spain
Carmen Lancho, Isaac Martín de Diego, Marina Cuesta, Víctor Aceña & Javier M. Moguerza
Madox Viajes, C/de Cantabria, 10, 28939, Arroyomolinos, Spain
Víctor Aceña

Authors

Carmen Lancho
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Martín de Diego
View author publications
You can also search for this author in PubMed Google Scholar
Marina Cuesta
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Aceña
View author publications
You can also search for this author in PubMed Google Scholar
Javier M. Moguerza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carmen Lancho .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Universidad Politecnica de Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Manchester, Manchester, UK
Richard Allmendinger
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Southern University of Science and Technology, Shenzhen, China
Ke Tang
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
University of Minho, Braga, Portugal
Paulo Novais
NOVA University of Lisbon, Lisbon, Portugal
Susana Nascimento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lancho, C., Martín de Diego, I., Cuesta, M., Aceña, V., M. Moguerza, J. (2021). A Complexity Measure for Binary Classification Problems Based on Lost Points. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-91608-4_14
Published: 23 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics