Abstract
Complexity measures are focused on exploring and capturing the complexity of a data set. In this paper, the Lost points (LP) complexity measure is proposed. It is obtained by applying k-means in a recursive and hierarchical way and it provides both the data set and the instance perspective. On the instance level, the LP measure gives a probability value for each point informing about the dominance of its class in its neighborhood. On the data set level, it estimates the proportion of lost points, referring to those points that are expected to be misclassified since they lie in areas where its class is not dominant. The proposed measure shows easily interpretable results competitive with measures from state-of-art. In addition, it provides probabilistic information useful to highlight the boundary decision on classification problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Algar, M.J., et al.: A quality of experience management framework for mobile users. Wirel. Commun. Mob. Comput. 2019, 11 (2019). https://doi.org/10.1155/2019/2352941. Article ID 2352941
Arruda, J.L.M., Prudêncio, R.B.C., Lorena, A.C.: Measuring instance hardness using data complexity measures. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12320, pp. 483–497. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61380-8_33
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Garcia, L., Lorena, A.: ECoL: Complexity Measures for Supervised Problems (2019). https://CRAN.R-project.org/package=ECoL, r package version 0.3.0
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surveys (CSUR) 52(5), 1–34 (2019)
Oh, S.: A new dataset evaluation method based on category overlap. Comput. Biol. Med. 41(2), 115–122 (2011)
Singh, S.: Prism-a novel framework for pattern recognition. Patt. Anal. Appl. 6(2), 134–149 (2003)
Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2013). https://doi.org/10.1007/s10994-013-5422-z
Wan, S., Zhao, Y., Wang, T., Gu, Z., Abbasi, Q.H., Choo, K.K.R.: Multi-dimensional data indexing and range query processing via Voronoi diagram for internet of things. Futur. Gener. Comput. Syst. 91, 382–391 (2019)
Weitzman, M.S.: Measures of overlap of income distributions of white and Negro families in the United States, vol. 22. US Bureau of the Census (1970)
Acknowledgements
This research has been supported by grants from Rey Juan Carlos University (Ref: C1PREDOC2020), Madrid Autonomous Community (Ref: IND2019/TIC-17194) and the Spanish Ministry of Economy and Competitiveness, under the Retos-Investigación program: MODAS-IN (Ref: RTI-2018-094269-B-I00).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lancho, C., Martín de Diego, I., Cuesta, M., Aceña, V., M. Moguerza, J. (2021). A Complexity Measure for Binary Classification Problems Based on Lost Points. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-91608-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)