Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise

García, V.; Sánchez, J. S.; Ochoa Domínguez, H. J.; Cleofas-Sánchez, L.

doi:10.1007/978-3-319-19390-8_42

Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise

V. García¹⁶,
J. S. Sánchez¹⁷,
H. J. Ochoa Domínguez¹⁸ &
…
L. Cleofas-Sánchez¹⁷

Conference paper
First Online: 01 January 2015

4048 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Abstract

This papers compares the behavior of three linear classifiers modeled on both the feature space and the dissimilarity space when the class imbalance of data sets interweaves with small disjuncts and noise. To this end, experiments are carried out over three synthetic databases with different imbalance ratios, levels of noise and complexity of the small disjuncts. Results suggest that small disjuncts can be much better overcome on the dissimilarity space than on the feature space, which means that the learning models will be only affected by imbalance and noise if the samples have firstly been mapped into the dissimilarity space.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)
MATH Google Scholar
García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, Heidelberg (2015)
Book Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Holte, R.C., Acker, L.E., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proceedings of 11th International Joint Conference on Artificial Intelligence, vol. 1, pp. 813–818, Detroit (1989)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
Article MathSciNet Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Article Google Scholar
Lughofer, E.: Single-pass active learning with conflict and ignorance. Evol. Syst. 3(4), 251–271 (2012)
Article Google Scholar
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
Chapter Google Scholar
Pȩkalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)
Article Google Scholar
Pȩkalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific, Singapore (2005)
Google Scholar
Pȩkalska, E., Duin, R.P.W., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Article Google Scholar
Pȩkalska, E., Paclik, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. J. Mach. Learn. Res. 2, 175–211 (2002)
MathSciNet Google Scholar
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Learning with class skews and small disjuncts. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 296–306. Springer, Heidelberg (2004)
Chapter Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Folleco, A.: An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf. Sci. 259, 571–595 (2014)
Article Google Scholar
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)
Article Google Scholar
Weiss, G.M., Hirsh, H.: The problem with noise and small disjuncts. In: Proceedings of 15th International Conference on Machine Learning, pp. 574–578, Madison (1998)
Google Scholar
Weiss, G.M.: The effect of small disjuncts and class distribution on decision tree learning. Ph.D. thesis, Rutgers University, New Brunswick (2003)
Google Scholar

Download references

Acknowledgment

This work has partially been supported by the Mexican Science and Technology Council (CONACYT-Mexico) through the Postdoctoral Fellowship Program [223351 and 232167], the Spanish Ministry of Economy [TIN2013-46522-P] and the Generalitat Valenciana [PROMETEOII/2014/062].

Author information

Authors and Affiliations

Multidisciplinary University Division, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, Chihuahua, Mexico
V. García
Department of Computer Languages and Systems, Institute of New Imaging Technologies, Universitat Jaume I, Castelló de la Plana, Spain
J. S. Sánchez & L. Cleofas-Sánchez
Department of Electrical and Computer Engineering, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, Chihuahua, Mexico
H. J. Ochoa Domínguez

Authors

V. García
View author publications
You can also search for this author in PubMed Google Scholar
J. S. Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
H. J. Ochoa Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
L. Cleofas-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. S. Sánchez .

Editor information

Editors and Affiliations

Universitat Politècnica de València, València, Spain
Roberto Paredes
Universidade do Porto, Porto, Portugal
Jaime S. Cardoso
Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Xosé M. Pardo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García, V., Sánchez, J.S., Ochoa Domínguez, H.J., Cleofas-Sánchez, L. (2015). Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-19390-8_42
Published: 09 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics