Improving Small-Scale Dataset Classification Performance Through Weak-Label Samples Generated by InfoGAN

Zhang, Meiyang; Miao, Qiguang; Ge, Daohui; Zhang, Zili

doi:10.1007/978-981-16-0705-9_6

Meiyang Zhang¹³,
Qiguang Miao¹⁴,
Daohui Ge¹⁴ &
…
Zili Zhang^13,15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1320))

Included in the following conference series:

CCF Conference on Big Data

946 Accesses

Abstract

Sufficient training data typically are required to train learning models. However, due to the expensive manual process for labeling a large number of samples, the amount of available training data is always limited (real data). Generative Adversarial Network (GAN) has good performance in generating artificial samples (generated data), the generated samples can be used as supplementary data to make up for the problem of small dataset with small sample size and insufficient diversity. Unfortunately, the generated data usually do not have annotation label. To make better use of the generated data, a learning framework WGSForest is proposed, which realizes the use of real data and generated data to train the classifier jointly.

In the WGSForest model, the supplementary data is generated through an improved InfoGAN to increase the amount and diversity of training data. Moreover, the generated supplementary data will be weakly labeled through InfoGAN. We utilize the advantage of deep forest on small dataset and take the generated data with a weak label as the supplement of real training data to optimize deep forest. In detail, The cascade forest in the improved deep forest (SForest) dynamically updates each generated data label to proper confidence, then the real data and generated data are combined to train the following layers of the improved cascade forest jointly. Experiment results showed that adding the weak label generated data effectively improves the classification performance of deep forest. On mnist (1000) subset, 100% generation rate can obtain 1.17% improvement, and 100% generation rate can obtain 6.2% improvement on 1000 cifar10 subset. Furthermore, each dataset can determine performance at a specific generation rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hu, G., Peng, X., Yang, Y., Hospedales, T.M., Verbeek, J.: Frankenstein: learning deep face representations using small data. IEEE Trans. Image Process. 27(1), 293–303 (2017)
Article MathSciNet Google Scholar
Chen, M., Shi, X., Zhang, Y., Wu, D., Guizani, M.: Deep features learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Google Scholar
Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Google Scholar
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3754–3762 (2017)
Google Scholar
Huang, Y., Xu, J., Wu, Q., Zheng, Z., Zhang, Z., Zhang, J.: Multi-pseudo regularized label for generated data in person re-identification. IEEE Trans. Image Process. 28(3), 1391–1403 (2018)
Article MathSciNet Google Scholar
Zhang, M., Zhang, Z.: Small-scale data classification based on deep forest. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds.) KSEM 2019. LNCS (LNAI), vol. 11775, pp. 428–439. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29551-6_38
Chapter Google Scholar
Zhou, Z.H., Feng, J.: Deep forest. arXiv preprint arXiv:1702.08835 (2017)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Google Scholar
Wen, H., Zhang, J., Lin, Q., Yang, K., Huang, P.: Multi-level deep cascade trees for conversion rate prediction in recommendation system. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 338–345 (2019)
Google Scholar
Wang, H., Tang, Y., Jia, Z., Ye, F.: Dense adaptive cascade forest: a self-adaptive deep ensemble for classification problems. Soft Comput. 24(4), 2955–2968 (2019). https://doi.org/10.1007/s00500-019-04073-5
Article Google Scholar
Pang, M., Ting, K.M., Zhao, P., Zhou, Z.H.: Improving deep forest by confidence screening. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1194–1199. IEEE (2018)
Google Scholar
Guo, Y., Liu, S., Li, Z., Shang, X.: BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform. 19(5), 118 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information Science, Southwest University, Chongqing, 400715, China
Meiyang Zhang & Zili Zhang
School of Computer Science and Technology, Xidian University, Xi’an, 710071, China
Qiguang Miao & Daohui Ge
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, VIC, 3220, Australia
Zili Zhang

Authors

Meiyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiguang Miao
View author publications
You can also search for this author in PubMed Google Scholar
Daohui Ge
View author publications
You can also search for this author in PubMed Google Scholar
Zili Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zili Zhang .

Editor information

Editors and Affiliations

PLA Academy of Military Sciences, Beijing, China
Hong Mei
Southwest University, Chongqing, China
Weiguo Zhang
The University of Edinburgh, Edinburgh, UK
Wenfei Fan
Southwest University, Chongqing, China
Zili Zhang
Nanjing University, Nanjing, China
Yihua Huang
Zhejiang University, Hangzhou, China
Jiajun Bu
Nanjing University, Nanjing, China
Yang Gao
Taiyuan University of Technology, Taiyuan, China
Li Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, M., Miao, Q., Ge, D., Zhang, Z. (2021). Improving Small-Scale Dataset Classification Performance Through Weak-Label Samples Generated by InfoGAN. In: Mei, H., et al. Big Data. BigData 2020. Communications in Computer and Information Science, vol 1320. Springer, Singapore. https://doi.org/10.1007/978-981-16-0705-9_6

Download citation

DOI: https://doi.org/10.1007/978-981-16-0705-9_6
Published: 01 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0704-2
Online ISBN: 978-981-16-0705-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)