Triple discriminator generative adversarial network for zero-shot image classification

Ji, Zhong; Yan, Jiangtao; Wang, Qiang; Pang, Yanwei; Li, Xuelong

doi:10.1007/s11432-020-3032-8

Triple discriminator generative adversarial network for zero-shot image classification

Research Paper
Published: 20 January 2021

Volume 64, article number 120101, (2021)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Zhong Ji¹,
Jiangtao Yan¹,
Qiang Wang¹,
Yanwei Pang¹ &
…
Xuelong Li²

339 Accesses
17 Citations
Explore all metrics

Abstract

One key challenge in zero-shot classification (ZSC) is the exploration of knowledge hidden in unseen classes. Generative methods such as generative adversarial networks (GANs) are typically employed to generate the visual information of unseen classes. However, the majority of these methods exploit global semantic features while neglecting the discriminative differences of local semantic features when synthesizing images, which may lead to sub-optimal results. In fact, local semantic information can provide more discriminative knowledge than global information can. To this end, this paper presents a new triple discriminator GAN for ZSC called TDGAN, which incorporates a text-reconstruction network into a dual discriminator GAN (D2GAN), allowing to realize cross-modal mapping from text descriptions to their visual representations. The text-reconstruction network focuses on key text descriptions for aligning semantic relationships to enable synthetic visual features to effectively represent images. Sharma-Mittal entropy is exploited in the loss function to make the distribution of synthetic classes be as close as possible to the distribution of real classes. The results of extensive experiments over the Caltech-UCSD Birds-2011 and North America Birds datasets demonstrate that the proposed TDGAN method consistently yields competitive performance compared to several state-of-the-art ZSC methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Bidirectional generative transductive zero-shot learning

Article 12 September 2020

A Joint Generative Model for Zero-Shot Learning

References

Fu Y W, Xiang T, Jiang Y G, et al. Recent advances in zero-shot recognition: toward data-efficient understanding of visual content. IEEE Signal Process Mag, 2018, 35: 112–125
Article Google Scholar
Guo G J, Wang H Z, Yan Y, et al. Large margin deep embedding for aesthetic image classification. Sci China Inf Sci, 2020, 63: 119101
Article Google Scholar
Zhu X X, Anguelov D, Ramanan D. Capturing long-tail distributions of object subcategories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 915–922
Guo Y C, Ding G G, Han J G, et al. Synthesizing samples for zero-shot learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017. 1774–1780
Ji Z, Sun Y X, Yu Y, et al. Attribute-guided network for cross-modal zero-shot hashing. IEEE Trans Neural Netw Learn Syst, 2020, 31: 321–330
Article Google Scholar
Long Y, Liu L, Shao L, et al. From zero-shot learning to conventional supervised classification: unseen visual data synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1627–1636
Yu Y L, Ji Z, Fu Y W, et al. Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 5995–6004
Akata Z, Perronnin F, Harchaoui Z, et al. Label embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2013. 819–826
Akata Z, Perronnin F, Harchaoui Z, et al. Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 1425–1438
Article Google Scholar
Changpinyo S, Chao W L, Gong B Q, et al. Synthesized classifiers for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5327–5336
Wang X Y, Ji Q. A unified probabilistic approach modeling relationships between attributes and objects. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2120–2127
Nguyen T D, Le T, Vu H, et al. Dual discriminator generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 2670–2680
Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset. 2011. http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
van Horn G, Branson S, Farrell R, et al. Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 595–604
Elhoseiny M, Saleh B, Elgammal A. Write a classifier: zero-shot learning using purely textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2584–2591
Ba J L, Swersky K, Fidler S. Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 4247–4255
Reed S, Akata Z, Lee H, et al. Learning deep representations of fine-grained visual descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 49–58
Qiao R Z, Liu L Q, Shen C H, et al. Less is more: zero-shot learning from online textual documents with noise suppression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2249–2257
Elhoseiny M, Zhu Y Z, Zhang H, et al. Link the head to the “beak”: zero shot learning from noisy text description at part precision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6288–6297
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2223–2232
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680
Martin A, Bottou L. Towards principled methods for training generative adversarial networks. 2017. ArXiv:1701.04862
Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5767–5777
Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 2017. 214–223
Xian Y Q, Lorenz T, Schiele B, et al. Feature generating networks for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5542–5551
Schonfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8247–8255
Bucher M, Herbin S, Jurie F. Generating visual representations for zero-shot classification. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2666–2673
Zhu Y Z, Elhoseiny M, Liu B C, et al. A generative adversarial approach for zero-shot learning from noisy texts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1004–1013
Li Y J, Swersky K, Zemel R. Generative moment matching networks. In: Proceedings of International Conference on Machine Learning, 2015. 1718–1727
Zhang H, Xu T, Elhoseiny M, et al. SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1143–1152
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manage, 1988, 24: 513–523
Article Google Scholar
Akata Z, Malinowski M, Fritz M, et al. Multi-cue zero-shot learning with strong supervision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 59–68
Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning. In: Proceedings of International Conference on Machine Learning, 2015. 2152–2161
Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 2927–2936
Chao W L, Changpinyo S, Gong B, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Proceedings of European Conference on Computer Vision, 2016. 52–68
Ji Z, Xiong K L, Pang Y W, et al. Video summarization with attention-based encoder-decoder networks. IEEE Trans Circ Syst Video Technol, 2020, 30: 1709–1717
Article Google Scholar
Wang Z H, Liu X, Lin J W, et al. Multi-attention based cross-domain beauty product image retrieval. Sci China Inf Sci, 2020, 63: 120112
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61771329, 61632018).

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Zhong Ji, Jiangtao Yan, Qiang Wang & Yanwei Pang
Center for OPTical IMagery Analysis and Learning, Northwestern Polytechnical University, Xi’an, 710129, China
Xuelong Li

Authors

Zhong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Yan
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Pang
View author publications
You can also search for this author in PubMed Google Scholar
Xuelong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Ji.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, Z., Yan, J., Wang, Q. et al. Triple discriminator generative adversarial network for zero-shot image classification. Sci. China Inf. Sci. 64, 120101 (2021). https://doi.org/10.1007/s11432-020-3032-8

Download citation

Received: 06 March 2020
Revised: 24 May 2020
Accepted: 01 July 2020
Published: 20 January 2021
DOI: https://doi.org/10.1007/s11432-020-3032-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Triple discriminator generative adversarial network for zero-shot image classification

Abstract

Access this article

Similar content being viewed by others

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Bidirectional generative transductive zero-shot learning

A Joint Generative Model for Zero-Shot Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Triple discriminator generative adversarial network for zero-shot image classification

Abstract

Access this article

Similar content being viewed by others

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Bidirectional generative transductive zero-shot learning

A Joint Generative Model for Zero-Shot Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation