Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?

Liaw, Jerad Zherui; Chai, Abel Yu Hao; Lee, Sue Han; Bonnet, Pierre; Joly, Alexis

doi:10.1007/978-3-031-78113-1_20

Jerad Zherui Liaw¹³,
Abel Yu Hao Chai¹³,
Sue Han Lee¹³,
Pierre Bonnet¹⁴ &
…
Alexis Joly¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15330))

Included in the following conference series:

International Conference on Pattern Recognition

342 Accesses

Abstract

Deep learning approaches have been pivotal in identifying multi-plant diseases, yet they often struggle with unseen data. The challenge of handling unseen data is significant due to the impracticality of collecting all disease samples for every plant species. This is attributed to the vast number of potential combinations between plant species and diseases, making capturing all such combinations in the field difficult. Recent approaches aim to tackle this issue by leveraging a zero-shot compositional setting. This involves extracting visual characteristics of plant species and diseases from the seen data in the training dataset and adapting them to unseen data. This paper introduces a novel approach by incorporating textual data to guide the vision model in learning the representation of multiple plants and diseases. To our knowledge, this is the first study to explore the effectiveness of a vision-language model in multi-plant disease identification, considering the fine-grained and challenging nature of disease textures. We experimentally prove that our proposed FF-CLIP model outperforms recent state-of-the-art models by 26.54% and 33.38% in Top-1 accuracy for unseen compositions, setting a solid baseline for zero-shot plant disease identification with the novel vision-language model. We release our code at https://github.com/abelchai/FF-CLIP-Can-Language-Improve-Visual-Features-For-Distinguishing-Unseen-Plant-Diseases.

J. Z. Liaw and A. Y. H. Chai—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The impact of fine-tuning paradigms on unknown plant diseases recognition

Article Open access 02 August 2024

Zero shot plant disease classification with semantic attributes

Article Open access 30 September 2024

Evaluating Deep CNNs and Vision Transformers for Plant Leaf Disease Classification

Notes

1.
https://wiki.bugwood.org/.

References

Ahmad, A., El Gamal, A., Saraswat, D.: Toward generalization of deep learning-based plant disease identification under controlled and field conditions. IEEE Access 11, 9042–9057 (2023)
Article Google Scholar
Cao, Y., Chen, L., Yuan, Y., Sun, G.: Cucumber disease recognition with small samples using image-text-label-based multi-modal language model. Comput. Electron. Agric. 211, 107993 (2023). https://doi.org/10.1016/j.compag.2023.107993, https://www.sciencedirect.com/science/article/pii/S0168169923003812
Chai, A.Y.H., et al.: Pairwise feature learning for unseen plant disease recognition. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 306–310. IEEE (2023)
Google Scholar
Crowson, K., et al.: VQGAN-clip: open domain image generation and editing with natural language guidance. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13697, pp. 88–105. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_6
Chapter Google Scholar
El Banani, M., Desai, K., Johnson, J.: Learning visual representations via language-guided sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19208–19220 (2023)
Google Scholar
Fan, X., Luo, P., Mu, Y., Zhou, R., Tjahjadi, T., Ren, Y.: Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 196, 106892 (2022). https://api.semanticscholar.org/CorpusID:247968352
Feng, X., Zhao, C., Wang, C., Wu, H., Miao, Y., Zhang, J.: A vegetable leaf disease identification model based on image-text cross-modal feature fusion. Front. Plant Sci. 13 (2022). https://api.semanticscholar.org/CorpusID:250034475
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013)
Google Scholar
Hao, S., Han, K., Wong, K.Y.K.: Learning attention as disentangler for compositional zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15315–15324 (2023)
Google Scholar
Hassan, S.M., Maji, A.K.: Plant disease identification using a novel convolutional neural network. IEEE Access 10, 5390–5401 (2022)
Article Google Scholar
Joly, A., Bonnet, P., Affouard, A., Lombardo, J.C., Goëau, H.: Pl@ntnet - my business. In: Proceedings of the 25th ACM international conference on Multimedia (2017). https://api.semanticscholar.org/CorpusID:34644257
Lee, S.H., Chan, C.S., Remagnino, P.: Multi-organ plant classification based on convolutional and recurrent neural networks. IEEE Trans. Image Process. 27(9), 4287–4301 (2018)
Article MathSciNet Google Scholar
Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: Attention-based recurrent neural network for plant disease classification. Front. Plant Sci. 11, 1897 (2020)
Article Google Scholar
Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: Conditional multi-task learning for plant disease identification. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 3320–3327. IEEE (2021)
Google Scholar
Lee, S.H., Goëau, H., Bonnet, P., Joly, A.: New perspectives on plant disease characterization based on deep learning. Comput. Electron. Agric. 170, 105220 (2020). https://doi.org/10.1016/j.compag.2020.105220, https://www.sciencedirect.com/science/article/pii/S0168169919300560
Ma, Y., Xu, G., Sun, X., Yan, M., Zhang, J., Ji, R.: X-clip: end-to-end multi-grained contrastive learning for video-text retrieval. In: Proceedings of the 30th ACM International Conference on Multimedia (2022). https://api.semanticscholar.org/CorpusID:250607505
Maurya, R., Pandey, N.N., Singh, V.P., Gopalakrishnan, T.: Plant disease classification using interpretable vision transformer network. In: 2023 International Conference on Recent Advances in Electrical, Electronics and Digital Healthcare Technologies (REEDCON), pp. 688–692 (2023). https://api.semanticscholar.org/CorpusID:259179406
Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1792–1801 (2017)
Google Scholar
Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016)
Article Google Scholar
Nayak, N.V., Yu, P., Bach, S.: Learning to compose soft prompts for compositional zero-shot learning. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=S8-A2FXnIh
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Google Scholar
Shoaib, M.A., et al.: An advanced deep learning models-based plant disease detection: a review of recent research. Front. Plant Sci. 14 (2023). https://api.semanticscholar.org/CorpusID:257678708
Tewel, Y., Shalev, Y., Schwartz, I., Wolf, L.: Zerocap: zero-shot image-to-text generation for visual-semantic arithmetic. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17897–17907 (2022). https://doi.org/10.1109/CVPR52688.2022.01739
Thakur, P.S., Sheorey, T., Ojha, A.: VGG-ICNN: a lightweight CNN model for crop disease identification. Multimedia Tools Appl. 82, 497–520 (2022). https://api.semanticscholar.org/CorpusID:249479235
Wang, C., et al.: A plant disease recognition method based on fusion of images and graph structure text. Front. Plant Sci. 12 (2022). https://api.semanticscholar.org/CorpusID:245908252
Yi, K., Elhoseiny, M.: Domain-aware continual zero-shot learning. arXiv abs/2112.12989 (2021). https://api.semanticscholar.org/CorpusID:245502766
Yu, J., Li, H., Hao, Y., Zhu, B., Xu, T., He, X.: CGT-GAN: clip-guided text GAN for image captioning. In: Proceedings of the 31st ACM International Conference on Multimedia (2023). https://api.semanticscholar.org/CorpusID:261076397
Zhang, Y., Jia, Q., Fan, X., Liu, Y., He, R.: CSCnet: class-specified cascaded network for compositional zero-shot learning (2024). https://api.semanticscholar.org/CorpusID:268349050

Download references

Acknowledgments

We appreciate the comments and advice from Hervé Goëau and Fei Siang Tay on our study and drafts. This research is supported by the FRGS MoHE Grant (Ref: FRGS/1/2021/ICT02/SWIN/03/2) from the Ministry of Higher Education Malaysia and Swinburne Sarawak Research Grant (Ref: RIF SSRG-Tay Fei Siang(30/12/24)). We gratefully acknowledged the support of NEUON AI for GPU workstation used for this research.

Author information

Authors and Affiliations

Swinburne University of Technology Sarawak Campus, Kuching, Malaysia
Jerad Zherui Liaw, Abel Yu Hao Chai & Sue Han Lee
CIRAD, UMR AMAP, Montpellier, France
Pierre Bonnet
INRIA, Montpellier, France
Alexis Joly

Authors

Jerad Zherui Liaw
View author publications
You can also search for this author in PubMed Google Scholar
Abel Yu Hao Chai
View author publications
You can also search for this author in PubMed Google Scholar
Sue Han Lee
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Bonnet
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Joly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerad Zherui Liaw .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 207 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liaw, J.Z., Chai, A.Y.H., Lee, S.H., Bonnet, P., Joly, A. (2025). Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases?. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15330. Springer, Cham. https://doi.org/10.1007/978-3-031-78113-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-78113-1_20
Published: 04 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78112-4
Online ISBN: 978-3-031-78113-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)