Skip to main content

Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Abstract

This papers compares the behavior of three linear classifiers modeled on both the feature space and the dissimilarity space when the class imbalance of data sets interweaves with small disjuncts and noise. To this end, experiments are carried out over three synthetic databases with different imbalance ratios, levels of noise and complexity of the small disjuncts. Results suggest that small disjuncts can be much better overcome on the dissimilarity space than on the feature space, which means that the learning models will be only affected by imbalance and noise if the samples have firstly been mapped into the dissimilarity space.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

  2. García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, Heidelberg (2015)

    Book  Google Scholar 

  3. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  4. Holte, R.C., Acker, L.E., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proceedings of 11th International Joint Conference on Artificial Intelligence, vol. 1, pp. 813–818, Detroit (1989)

    Google Scholar 

  5. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  6. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. Newsl. 6(1), 40–49 (2004)

    Article  MathSciNet  Google Scholar 

  7. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)

    Article  Google Scholar 

  8. Lughofer, E.: Single-pass active learning with conflict and ignorance. Evol. Syst. 3(4), 251–271 (2012)

    Article  Google Scholar 

  9. Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Pȩkalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)

    Article  Google Scholar 

  11. Pȩkalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific, Singapore (2005)

    Google Scholar 

  12. Pȩkalska, E., Duin, R.P.W., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)

    Article  Google Scholar 

  13. Pȩkalska, E., Paclik, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. J. Mach. Learn. Res. 2, 175–211 (2002)

    MathSciNet  Google Scholar 

  14. Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Learning with class skews and small disjuncts. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 296–306. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Folleco, A.: An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf. Sci. 259, 571–595 (2014)

    Article  Google Scholar 

  16. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  17. Weiss, G.M., Hirsh, H.: The problem with noise and small disjuncts. In: Proceedings of 15th International Conference on Machine Learning, pp. 574–578, Madison (1998)

    Google Scholar 

  18. Weiss, G.M.: The effect of small disjuncts and class distribution on decision tree learning. Ph.D. thesis, Rutgers University, New Brunswick (2003)

    Google Scholar 

Download references

Acknowledgment

This work has partially been supported by the Mexican Science and Technology Council (CONACYT-Mexico) through the Postdoctoral Fellowship Program [223351 and 232167], the Spanish Ministry of Economy [TIN2013-46522-P] and the Generalitat Valenciana [PROMETEOII/2014/062].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. S. Sánchez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

García, V., Sánchez, J.S., Ochoa Domínguez, H.J., Cleofas-Sánchez, L. (2015). Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19390-8_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19389-2

  • Online ISBN: 978-3-319-19390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics