Skip to main content

RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9011))

Included in the following conference series:

Abstract

The problem of imbalanced data, i.e., when the class labels are unequally distributed, is encountered in many real-life application, e.g., credit scoring, medical diagnostics. Various approaches aimed at dealing with the imbalanced data have been proposed. One of the most well known data pre-processing method is the Synthetic Minority Oversampling Technique (SMOTE). However, SMOTE may generate examples which are artificial in the sense that they are impossible to be drawn from the true distribution. Therefore, in this paper, we propose to apply Restricted Boltzmann Machine to learn an intermediate representation which transform the SMOTE examples to the ones approximately drawn from the true distribution. At the end of the paper we perform an experiment using credit scoring dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the classimbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  3. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Chen, S., He, H., Garcia, E.: RAMOBoost: Ranked minority oversampling in boosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)

    Article  Google Scholar 

  5. Ertekin, S., Huang, J., Giles, C.: Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 823–824. ACM (2007)

    Google Scholar 

  6. García, S., Fernández, A., Herrera, F.: Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Applied Soft Computing 9(4), 1304–1314 (2009)

    Article  Google Scholar 

  7. He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  8. Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Statistical Analysis and Data Mining 2(5–6), 412–426 (2009)

    Article  MathSciNet  Google Scholar 

  9. Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)

    Google Scholar 

  10. Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)

    Google Scholar 

  13. Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)

    Google Scholar 

  14. Mani, J., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets (2003)

    Google Scholar 

  15. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowledge and Information Systems 33(2), 245–265 (2012)

    Article  Google Scholar 

  16. Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences (2014)

    Google Scholar 

  17. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 791–798. ACM (2007)

    Google Scholar 

  18. Seiffert, C., Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 40(1), 185–197 (2010)

    Article  Google Scholar 

  19. Tang, Y., Zhang, Y., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3), 365–381 (2007)

    Article  Google Scholar 

  20. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(7), 1088–1099 (2006)

    Article  Google Scholar 

  21. Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 6(11), 769–772 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  22. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Zięba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zięba, M., Tomczak, J.M., Gonczarek, A. (2015). RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique. In: Nguyen, N., Trawiński, B., Kosala, R. (eds) Intelligent Information and Database Systems. ACIIDS 2015. Lecture Notes in Computer Science(), vol 9011. Springer, Cham. https://doi.org/10.1007/978-3-319-15702-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15702-3_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15701-6

  • Online ISBN: 978-3-319-15702-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics