Skip to main content

Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15330))

Included in the following conference series:

  • 342 Accesses

Abstract

Deep learning approaches have been pivotal in identifying multi-plant diseases, yet they often struggle with unseen data. The challenge of handling unseen data is significant due to the impracticality of collecting all disease samples for every plant species. This is attributed to the vast number of potential combinations between plant species and diseases, making capturing all such combinations in the field difficult. Recent approaches aim to tackle this issue by leveraging a zero-shot compositional setting. This involves extracting visual characteristics of plant species and diseases from the seen data in the training dataset and adapting them to unseen data. This paper introduces a novel approach by incorporating textual data to guide the vision model in learning the representation of multiple plants and diseases. To our knowledge, this is the first study to explore the effectiveness of a vision-language model in multi-plant disease identification, considering the fine-grained and challenging nature of disease textures. We experimentally prove that our proposed FF-CLIP model outperforms recent state-of-the-art models by 26.54% and 33.38% in Top-1 accuracy for unseen compositions, setting a solid baseline for zero-shot plant disease identification with the novel vision-language model. We release our code at https://github.com/abelchai/FF-CLIP-Can-Language-Improve-Visual-Features-For-Distinguishing-Unseen-Plant-Diseases.

J. Z. Liaw and A. Y. H. Chai—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://wiki.bugwood.org/.

References

  1. Ahmad, A., El Gamal, A., Saraswat, D.: Toward generalization of deep learning-based plant disease identification under controlled and field conditions. IEEE Access 11, 9042–9057 (2023)

    Article  Google Scholar 

  2. Cao, Y., Chen, L., Yuan, Y., Sun, G.: Cucumber disease recognition with small samples using image-text-label-based multi-modal language model. Comput. Electron. Agric. 211, 107993 (2023). https://doi.org/10.1016/j.compag.2023.107993, https://www.sciencedirect.com/science/article/pii/S0168169923003812

  3. Chai, A.Y.H., et al.: Pairwise feature learning for unseen plant disease recognition. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 306–310. IEEE (2023)

    Google Scholar 

  4. Crowson, K., et al.: VQGAN-clip: open domain image generation and editing with natural language guidance. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13697, pp. 88–105. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_6

    Chapter  Google Scholar 

  5. El Banani, M., Desai, K., Johnson, J.: Learning visual representations via language-guided sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19208–19220 (2023)

    Google Scholar 

  6. Fan, X., Luo, P., Mu, Y., Zhou, R., Tjahjadi, T., Ren, Y.: Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 196, 106892 (2022). https://api.semanticscholar.org/CorpusID:247968352

  7. Feng, X., Zhao, C., Wang, C., Wu, H., Miao, Y., Zhang, J.: A vegetable leaf disease identification model based on image-text cross-modal feature fusion. Front. Plant Sci. 13 (2022). https://api.semanticscholar.org/CorpusID:250034475

  8. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013)

    Google Scholar 

  9. Hao, S., Han, K., Wong, K.Y.K.: Learning attention as disentangler for compositional zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15315–15324 (2023)

    Google Scholar 

  10. Hassan, S.M., Maji, A.K.: Plant disease identification using a novel convolutional neural network. IEEE Access 10, 5390–5401 (2022)

    Article  Google Scholar 

  11. Joly, A., Bonnet, P., Affouard, A., Lombardo, J.C., Goëau, H.: Pl@ntnet - my business. In: Proceedings of the 25th ACM international conference on Multimedia (2017). https://api.semanticscholar.org/CorpusID:34644257

  12. Lee, S.H., Chan, C.S., Remagnino, P.: Multi-organ plant classification based on convolutional and recurrent neural networks. IEEE Trans. Image Process. 27(9), 4287–4301 (2018)

    Article  MathSciNet  Google Scholar 

  13. Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: Attention-based recurrent neural network for plant disease classification. Front. Plant Sci. 11, 1897 (2020)

    Article  Google Scholar 

  14. Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: Conditional multi-task learning for plant disease identification. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 3320–3327. IEEE (2021)

    Google Scholar 

  15. Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: New perspectives on plant disease characterization based on deep learning. Comput. Electron. Agric. 170, 105220 (2020). https://doi.org/10.1016/j.compag.2020.105220, https://www.sciencedirect.com/science/article/pii/S0168169919300560

  16. Ma, Y., Xu, G., Sun, X., Yan, M., Zhang, J., Ji, R.: X-clip: end-to-end multi-grained contrastive learning for video-text retrieval. In: Proceedings of the 30th ACM International Conference on Multimedia (2022). https://api.semanticscholar.org/CorpusID:250607505

  17. Maurya, R., Pandey, N.N., Singh, V.P., Gopalakrishnan, T.: Plant disease classification using interpretable vision transformer network. In: 2023 International Conference on Recent Advances in Electrical, Electronics and Digital Healthcare Technologies (REEDCON), pp. 688–692 (2023). https://api.semanticscholar.org/CorpusID:259179406

  18. Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1792–1801 (2017)

    Google Scholar 

  19. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016)

    Article  Google Scholar 

  20. Nayak, N.V., Yu, P., Bach, S.: Learning to compose soft prompts for compositional zero-shot learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=S8-A2FXnIh

  21. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445

  22. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  23. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)

    Google Scholar 

  24. Shoaib, M.A., et al.: An advanced deep learning models-based plant disease detection: a review of recent research. Front. Plant Sci. 14 (2023). https://api.semanticscholar.org/CorpusID:257678708

  25. Tewel, Y., Shalev, Y., Schwartz, I., Wolf, L.: Zerocap: zero-shot image-to-text generation for visual-semantic arithmetic. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17897–17907 (2022). https://doi.org/10.1109/CVPR52688.2022.01739

  26. Thakur, P.S., Sheorey, T., Ojha, A.: VGG-ICNN: a lightweight CNN model for crop disease identification. Multimedia Tools Appl. 82, 497–520 (2022). https://api.semanticscholar.org/CorpusID:249479235

  27. Wang, C., et al.: A plant disease recognition method based on fusion of images and graph structure text. Front. Plant Sci. 12 (2022). https://api.semanticscholar.org/CorpusID:245908252

  28. Yi, K., Elhoseiny, M.: Domain-aware continual zero-shot learning. arXiv abs/2112.12989 (2021). https://api.semanticscholar.org/CorpusID:245502766

  29. Yu, J., Li, H., Hao, Y., Zhu, B., Xu, T., He, X.: CGT-GAN: clip-guided text GAN for image captioning. In: Proceedings of the 31st ACM International Conference on Multimedia (2023). https://api.semanticscholar.org/CorpusID:261076397

  30. Zhang, Y., Jia, Q., Fan, X., Liu, Y., He, R.: CSCnet: class-specified cascaded network for compositional zero-shot learning (2024). https://api.semanticscholar.org/CorpusID:268349050

Download references

Acknowledgments

We appreciate the comments and advice from Hervé Goëau and Fei Siang Tay on our study and drafts. This research is supported by the FRGS MoHE Grant (Ref: FRGS/1/2021/ICT02/SWIN/03/2) from the Ministry of Higher Education Malaysia and Swinburne Sarawak Research Grant (Ref: RIF SSRG-Tay Fei Siang(30/12/24)). We gratefully acknowledged the support of NEUON AI for GPU workstation used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerad Zherui Liaw .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 207 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liaw, J.Z., Chai, A.Y.H., Lee, S.H., Bonnet, P., Joly, A. (2025). Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15330. Springer, Cham. https://doi.org/10.1007/978-3-031-78113-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78113-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78112-4

  • Online ISBN: 978-3-031-78113-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics