Abstract
Class-imbalance Learning is one of the hot research issues in machine learning. In the practical application of distributed class-imbalance learning, data continues to arrive, which often leads to class-imbalance situations. The imbalance problem in the distributed scenario is particular: the imbalanced state of different nodes may be complementary. The imbalanced states of different nodes may be complementary. Using this complementary relationship to do oversampling to change the imbalanced state is a valuable method. However, the data island limits data sharing in this case between the nodes. To this end, we propose DOS-GAN, which can take turns to use the data of one same class data on multiple nodes to train the global GAN model, and then use this GAN generator to oversampling the class without the original data being exchanged. Extensive experiments confirm that DOS-GAN outperforms the combination of traditional methods and achieves classification accuracy closes to the method of data aggregating.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: ICML, vol. 99, pp. 97–105 (1999)
González-Serrano, F.J., Navia-Vázquez, Á., Amor-Martín, A.: Training support vector machines with privacy-protected data. Pattern Recogn. 72, 93–107 (2017). https://doi.org/10.1016/j.patcog.2017.06.016
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Guan, H., Wang, Y., Ma, X., Li, Y.: DCIGAN: a distributed class-incremental learning method based on generative adversarial networks. In: 2019 IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pp. 768–775 (2019)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, vol. 3, pp. 5767–5777 (2017)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 593–599. ACM (2005)
Konecný, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency. CoRR abs/1610.05492 (2016). http://arxiv.org/abs/1610.05492
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B (Cybernetics) 39(2), 539–550 (2008)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
McMahan, H.B., Moore, E., Ramage, D., Arcas, B.A.: Federated learning of deep networks using model averaging. CoRR abs/1602.05629 (2016). http://arxiv.org/abs/1602.05629
Ming, Y., Zhao, Y., Wu, C., Li, K., Yin, J.: Distributed and asynchronous stochastic gradient descent with variance reduction. Neurocomputing 281, 27–36 (2018)
Ming, Y., Zhu, E., Wang, M., Ye, Y., Liu, X., Yin, J.: DMP-ELMs: data and model parallel extreme learning machines for large-scale learning tasks. Neurocomputing 320, 85–97 (2018)
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems, pp. 841–848 (2002)
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2009). https://doi.org/10.1007/s10462-009-9124-7
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS 1999, pp. 582–588. MIT Press, Cambridge (1999)
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. B (Cybern.) 39(1), 281–288 (2008)
Teo, S.G., Cao, J., Lee, V.C.: DAG: a general model for privacy-preserving data mining. IEEE Trans. Knowl. Data Eng. 1–1 (2018). https://doi.org/10.1109/tkde.2018.2880743
Wang, Y., Ma, X.: A general scalable and elastic content-based publish/subscribe service. IEEE Trans. Parallel Distrib. Syst. 26(8), 2100–2113 (2015). https://doi.org/10.1109/TPDS.2014.2346759
Wang, Y., Pei, X., Ma, X., Xu, F.: TA-Update: an adaptive update scheme with tree-structured transmission in erasure-coded storage systems. IEEE Trans. Parallel Distrib. Syst. 29(8), 1893–1906 (2018). https://doi.org/10.1109/TPDS.2017.2717981
Yang, Z., Wright, R.: Privacy-preserving computation of Bayesian networks on vertically partitioned data. IEEE Trans. Knowl. Data Eng. 18(9), 1253–1264 (2006). https://doi.org/10.1109/tkde.2006.147
Acknowledgment
The authors would like to thank the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Guan, H., Ma, X., Shen, S. (2020). DOS-GAN: A Distributed Over-Sampling Method Based on Generative Adversarial Networks for Distributed Class-Imbalance Learning. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12454. Springer, Cham. https://doi.org/10.1007/978-3-030-60248-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-60248-2_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60247-5
Online ISBN: 978-3-030-60248-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)