Abstract
Sufficient training data typically are required to train learning models. However, due to the expensive manual process for labeling a large number of samples, the amount of available training data is always limited (real data). Generative Adversarial Network (GAN) has good performance in generating artificial samples (generated data), the generated samples can be used as supplementary data to make up for the problem of small dataset with small sample size and insufficient diversity. Unfortunately, the generated data usually do not have annotation label. To make better use of the generated data, a learning framework WGSForest is proposed, which realizes the use of real data and generated data to train the classifier jointly.
In the WGSForest model, the supplementary data is generated through an improved InfoGAN to increase the amount and diversity of training data. Moreover, the generated supplementary data will be weakly labeled through InfoGAN. We utilize the advantage of deep forest on small dataset and take the generated data with a weak label as the supplement of real training data to optimize deep forest. In detail, The cascade forest in the improved deep forest (SForest) dynamically updates each generated data label to proper confidence, then the real data and generated data are combined to train the following layers of the improved cascade forest jointly. Experiment results showed that adding the weak label generated data effectively improves the classification performance of deep forest. On mnist (1000) subset, 100% generation rate can obtain 1.17% improvement, and 100% generation rate can obtain 6.2% improvement on 1000 cifar10 subset. Furthermore, each dataset can determine performance at a specific generation rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hu, G., Peng, X., Yang, Y., Hospedales, T.M., Verbeek, J.: Frankenstein: learning deep face representations using small data. IEEE Trans. Image Process. 27(1), 293–303 (2017)
Chen, M., Shi, X., Zhang, Y., Wu, D., Guizani, M.: Deep features learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. Big Data (2017)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3754–3762 (2017)
Huang, Y., Xu, J., Wu, Q., Zheng, Z., Zhang, Z., Zhang, J.: Multi-pseudo regularized label for generated data in person re-identification. IEEE Trans. Image Process. 28(3), 1391–1403 (2018)
Zhang, M., Zhang, Z.: Small-scale data classification based on deep forest. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds.) KSEM 2019. LNCS (LNAI), vol. 11775, pp. 428–439. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29551-6_38
Zhou, Z.H., Feng, J.: Deep forest. arXiv preprint arXiv:1702.08835 (2017)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Wen, H., Zhang, J., Lin, Q., Yang, K., Huang, P.: Multi-level deep cascade trees for conversion rate prediction in recommendation system. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 338–345 (2019)
Wang, H., Tang, Y., Jia, Z., Ye, F.: Dense adaptive cascade forest: a self-adaptive deep ensemble for classification problems. Soft Comput. 24(4), 2955–2968 (2019). https://doi.org/10.1007/s00500-019-04073-5
Pang, M., Ting, K.M., Zhao, P., Zhou, Z.H.: Improving deep forest by confidence screening. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 1194–1199. IEEE (2018)
Guo, Y., Liu, S., Li, Z., Shang, X.: BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform. 19(5), 118 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, M., Miao, Q., Ge, D., Zhang, Z. (2021). Improving Small-Scale Dataset Classification Performance Through Weak-Label Samples Generated by InfoGAN. In: Mei, H., et al. Big Data. BigData 2020. Communications in Computer and Information Science, vol 1320. Springer, Singapore. https://doi.org/10.1007/978-981-16-0705-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-16-0705-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0704-2
Online ISBN: 978-981-16-0705-9
eBook Packages: Computer ScienceComputer Science (R0)