Abstract
Deep learning approaches have been pivotal in identifying multi-plant diseases, yet they often struggle with unseen data. The challenge of handling unseen data is significant due to the impracticality of collecting all disease samples for every plant species. This is attributed to the vast number of potential combinations between plant species and diseases, making capturing all such combinations in the field difficult. Recent approaches aim to tackle this issue by leveraging a zero-shot compositional setting. This involves extracting visual characteristics of plant species and diseases from the seen data in the training dataset and adapting them to unseen data. This paper introduces a novel approach by incorporating textual data to guide the vision model in learning the representation of multiple plants and diseases. To our knowledge, this is the first study to explore the effectiveness of a vision-language model in multi-plant disease identification, considering the fine-grained and challenging nature of disease textures. We experimentally prove that our proposed FF-CLIP model outperforms recent state-of-the-art models by 26.54% and 33.38% in Top-1 accuracy for unseen compositions, setting a solid baseline for zero-shot plant disease identification with the novel vision-language model. We release our code at https://github.com/abelchai/FF-CLIP-Can-Language-Improve-Visual-Features-For-Distinguishing-Unseen-Plant-Diseases.
J. Z. Liaw and A. Y. H. Chai—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Ahmad, A., El Gamal, A., Saraswat, D.: Toward generalization of deep learning-based plant disease identification under controlled and field conditions. IEEE Access 11, 9042–9057 (2023)
Cao, Y., Chen, L., Yuan, Y., Sun, G.: Cucumber disease recognition with small samples using image-text-label-based multi-modal language model. Comput. Electron. Agric. 211, 107993 (2023). https://doi.org/10.1016/j.compag.2023.107993, https://www.sciencedirect.com/science/article/pii/S0168169923003812
Chai, A.Y.H., et al.: Pairwise feature learning for unseen plant disease recognition. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 306–310. IEEE (2023)
Crowson, K., et al.: VQGAN-clip: open domain image generation and editing with natural language guidance. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13697, pp. 88–105. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_6
El Banani, M., Desai, K., Johnson, J.: Learning visual representations via language-guided sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19208–19220 (2023)
Fan, X., Luo, P., Mu, Y., Zhou, R., Tjahjadi, T., Ren, Y.: Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 196, 106892 (2022). https://api.semanticscholar.org/CorpusID:247968352
Feng, X., Zhao, C., Wang, C., Wu, H., Miao, Y., Zhang, J.: A vegetable leaf disease identification model based on image-text cross-modal feature fusion. Front. Plant Sci. 13 (2022). https://api.semanticscholar.org/CorpusID:250034475
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013)
Hao, S., Han, K., Wong, K.Y.K.: Learning attention as disentangler for compositional zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15315–15324 (2023)
Hassan, S.M., Maji, A.K.: Plant disease identification using a novel convolutional neural network. IEEE Access 10, 5390–5401 (2022)
Joly, A., Bonnet, P., Affouard, A., Lombardo, J.C., Goëau, H.: Pl@ntnet - my business. In: Proceedings of the 25th ACM international conference on Multimedia (2017). https://api.semanticscholar.org/CorpusID:34644257
Lee, S.H., Chan, C.S., Remagnino, P.: Multi-organ plant classification based on convolutional and recurrent neural networks. IEEE Trans. Image Process. 27(9), 4287–4301 (2018)
Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: Attention-based recurrent neural network for plant disease classification. Front. Plant Sci. 11, 1897 (2020)
Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: Conditional multi-task learning for plant disease identification. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 3320–3327. IEEE (2021)
Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: New perspectives on plant disease characterization based on deep learning. Comput. Electron. Agric. 170, 105220 (2020). https://doi.org/10.1016/j.compag.2020.105220, https://www.sciencedirect.com/science/article/pii/S0168169919300560
Ma, Y., Xu, G., Sun, X., Yan, M., Zhang, J., Ji, R.: X-clip: end-to-end multi-grained contrastive learning for video-text retrieval. In: Proceedings of the 30th ACM International Conference on Multimedia (2022). https://api.semanticscholar.org/CorpusID:250607505
Maurya, R., Pandey, N.N., Singh, V.P., Gopalakrishnan, T.: Plant disease classification using interpretable vision transformer network. In: 2023 International Conference on Recent Advances in Electrical, Electronics and Digital Healthcare Technologies (REEDCON), pp. 688–692 (2023). https://api.semanticscholar.org/CorpusID:259179406
Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1792–1801 (2017)
Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016)
Nayak, N.V., Yu, P., Bach, S.: Learning to compose soft prompts for compositional zero-shot learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=S8-A2FXnIh
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Shoaib, M.A., et al.: An advanced deep learning models-based plant disease detection: a review of recent research. Front. Plant Sci. 14 (2023). https://api.semanticscholar.org/CorpusID:257678708
Tewel, Y., Shalev, Y., Schwartz, I., Wolf, L.: Zerocap: zero-shot image-to-text generation for visual-semantic arithmetic. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17897–17907 (2022). https://doi.org/10.1109/CVPR52688.2022.01739
Thakur, P.S., Sheorey, T., Ojha, A.: VGG-ICNN: a lightweight CNN model for crop disease identification. Multimedia Tools Appl. 82, 497–520 (2022). https://api.semanticscholar.org/CorpusID:249479235
Wang, C., et al.: A plant disease recognition method based on fusion of images and graph structure text. Front. Plant Sci. 12 (2022). https://api.semanticscholar.org/CorpusID:245908252
Yi, K., Elhoseiny, M.: Domain-aware continual zero-shot learning. arXiv abs/2112.12989 (2021). https://api.semanticscholar.org/CorpusID:245502766
Yu, J., Li, H., Hao, Y., Zhu, B., Xu, T., He, X.: CGT-GAN: clip-guided text GAN for image captioning. In: Proceedings of the 31st ACM International Conference on Multimedia (2023). https://api.semanticscholar.org/CorpusID:261076397
Zhang, Y., Jia, Q., Fan, X., Liu, Y., He, R.: CSCnet: class-specified cascaded network for compositional zero-shot learning (2024). https://api.semanticscholar.org/CorpusID:268349050
Acknowledgments
We appreciate the comments and advice from Hervé Goëau and Fei Siang Tay on our study and drafts. This research is supported by the FRGS MoHE Grant (Ref: FRGS/1/2021/ICT02/SWIN/03/2) from the Ministry of Higher Education Malaysia and Swinburne Sarawak Research Grant (Ref: RIF SSRG-Tay Fei Siang(30/12/24)). We gratefully acknowledged the support of NEUON AI for GPU workstation used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liaw, J.Z., Chai, A.Y.H., Lee, S.H., Bonnet, P., Joly, A. (2025). Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15330. Springer, Cham. https://doi.org/10.1007/978-3-031-78113-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-78113-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78112-4
Online ISBN: 978-3-031-78113-1
eBook Packages: Computer ScienceComputer Science (R0)