Skip to main content

DOS-GAN: A Distributed Over-Sampling Method Based on Generative Adversarial Networks for Distributed Class-Imbalance Learning

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12454))

Abstract

Class-imbalance Learning is one of the hot research issues in machine learning. In the practical application of distributed class-imbalance learning, data continues to arrive, which often leads to class-imbalance situations. The imbalance problem in the distributed scenario is particular: the imbalanced state of different nodes may be complementary. The imbalanced states of different nodes may be complementary. Using this complementary relationship to do oversampling to change the imbalanced state is a valuable method. However, the data island limits data sharing in this case between the nodes. To this end, we propose DOS-GAN, which can take turns to use the data of one same class data on multiple nodes to train the global GAN model, and then use this GAN generator to oversampling the class without the original data being exchanged. Extensive experiments confirm that DOS-GAN outperforms the combination of traditional methods and achieves classification accuracy closes to the method of data aggregating.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://yann.lecun.com/exdb/mnist.

  2. 2.

    https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data.

  3. 3.

    https://archive.ics.uci.edu/ml/datasets/dataset+for+sensorless+drive+diagnosis.

  4. 4.

    https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite).

  5. 5.

    https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+ smartphones.

  6. 6.

    https://cs.stanford.edu/~acoates/stl10/.

References

  1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  3. Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)

    Google Scholar 

  4. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: ICML, vol. 99, pp. 97–105 (1999)

    Google Scholar 

  5. González-Serrano, F.J., Navia-Vázquez, Á., Amor-Martín, A.: Training support vector machines with privacy-protected data. Pattern Recogn. 72, 93–107 (2017). https://doi.org/10.1016/j.patcog.2017.06.016

    Article  Google Scholar 

  6. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  7. Guan, H., Wang, Y., Ma, X., Li, Y.: DCIGAN: a distributed class-incremental learning method based on generative adversarial networks. In: 2019 IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pp. 768–775 (2019)

    Google Scholar 

  8. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, vol. 3, pp. 5767–5777 (2017)

    Google Scholar 

  9. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  10. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  11. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  12. Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)

    Google Scholar 

  13. Konecný, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. CoRR abs/1610.05492 (2016). http://arxiv.org/abs/1610.05492

  14. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B (Cybernetics) 39(2), 539–550 (2008)

    Google Scholar 

  15. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)

    Google Scholar 

  16. McMahan, H.B., Moore, E., Ramage, D., Arcas, B.A.: Federated learning of deep networks using model averaging. CoRR abs/1602.05629 (2016). http://arxiv.org/abs/1602.05629

  17. Ming, Y., Zhao, Y., Wu, C., Li, K., Yin, J.: Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281, 27–36 (2018)

    Article  Google Scholar 

  18. Ming, Y., Zhu, E., Wang, M., Ye, Y., Liu, X., Yin, J.: DMP-ELMs: data and model parallel extreme learning machines for large-scale learning tasks. Neurocomputing 320, 85–97 (2018)

    Article  Google Scholar 

  19. Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems, pp. 841–848 (2002)

    Google Scholar 

  20. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2009). https://doi.org/10.1007/s10462-009-9124-7

    Article  MathSciNet  Google Scholar 

  21. Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS 1999, pp. 582–588. MIT Press, Cambridge (1999)

    Google Scholar 

  22. Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. B (Cybern.) 39(1), 281–288 (2008)

    Article  Google Scholar 

  23. Teo, S.G., Cao, J., Lee, V.C.: DAG: a general model for privacy-preserving data mining. IEEE Trans. Knowl. Data Eng. 1–1 (2018). https://doi.org/10.1109/tkde.2018.2880743

  24. Wang, Y., Ma, X.: A general scalable and elastic content-based publish/subscribe service. IEEE Trans. Parallel Distrib. Syst. 26(8), 2100–2113 (2015). https://doi.org/10.1109/TPDS.2014.2346759

    Article  Google Scholar 

  25. Wang, Y., Pei, X., Ma, X., Xu, F.: TA-Update: an adaptive update scheme with tree-structured transmission in erasure-coded storage systems. IEEE Trans. Parallel Distrib. Syst. 29(8), 1893–1906 (2018). https://doi.org/10.1109/TPDS.2017.2717981

    Article  Google Scholar 

  26. Yang, Z., Wright, R.: Privacy-preserving computation of Bayesian networks on vertically partitioned data. IEEE Trans. Knowl. Data Eng. 18(9), 1253–1264 (2006). https://doi.org/10.1109/tkde.2006.147

    Article  Google Scholar 

Download references

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongtao Guan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guan, H., Ma, X., Shen, S. (2020). DOS-GAN: A Distributed Over-Sampling Method Based on Generative Adversarial Networks for Distributed Class-Imbalance Learning. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12454. Springer, Cham. https://doi.org/10.1007/978-3-030-60248-2_42

Download citation

Publish with us

Policies and ethics