1 Introduction

The rapid growth of the world’s population has led to a demand for agricultural sectors to supply a diverse range of agricultural products. To overcome additional production challenges for the rapidly rising population worldwide, boosting agricultural output is important (Attri et al 2023; Saranya et al 2023). However, this task is made harder by environmental problems such as water shortages, harsh environments, and biodiversity loss (Attri et al 2023). Therefore, discovering new durable methods by utilising cutting-edge technologies is crucial for the agricultural industry. (Fróna et al 2019; Le Mouël et al 2018).

Nowadays, deep learning (DL), a cutting-edge technology, is extensively used in various fields such as image segmentation (Gour et al 2019), biomedical image classification (Gour and Jain 2020), voice and signal classification (Abdel-Hamid et al 2014), and object detection (Redmon et al 2016). Likewise, DL is also being used in several areas of agriculture such as disease identification (Jiang et al 2021), and crop-weed detection (Bosilj et al 2020; Espejo-Garcia et al 2020). Among the DL techniques, Convolutional Neural Networks (CNNs) are the specific type of models that excel in tasks involving visual data, such as images and videos. CNNs use convolutional layers to automatically extract spatial features like edges, textures, and patterns from images. Due to the proven success and popularity of CNNs, they are considered among the best methods.

However, CNNs demonstrate generalisability and improved results on an adequately large dataset (Abbas et al 2021; Liu and Wang 2021). The performance of CNNs heavily drops when there is a lack of enough labelled data for training CNN models (Espejo-Garcia et al 2020, 2021; Abbas et al 2021). Nevertheless, most of the datasets on various sectors of agriculture generally do not comprise enough labelled data in diverse situations (Bosilj et al 2020; Espejo-Garcia et al 2021). It is because collecting data requires a substantial labelling effort that requires experts to prevent misclassification and other errors (Dyrmann et al 2016; Minervini et al 2017). Although, to address the shortage of enough data, techniques such as data augmentation (Liu and Zhang 2022) and image generation (Trabucco et al 2023) have been employed, these techniques come with certain limitations. Data augmentation, merely modifies existing images, potentially leading to overfitting when models repeatedly see an augmented version of the same image. On the other hand, image generation often relies on Generative Adversarial Networks (GANs) (Goodfellow et al 2020), which may produce unrealistic and low-quality images. Apart from large data requirements, another challenge is that training a CNN model from scratch is time-consuming (Suh et al 2018), for which data augmentation and GANs cannot provide a solution. Furthermore, agricultural data comes with numerous other challenges such as intra-class variation, inter-class similarities (Zhang et al 2020), illumination, occlusion, and colour (Espejo-Garcia et al 2021). Hence, training CNNs from scratch in the field of agriculture remains a major scientific challenge (Kaya et al 2019).

Consequently, there is a need for an alternative technique to train CNN models with a limited number of labelled data. To fill this gap, the concept of Transfer Learning (TL), a specialised technique of DL, is advised, where a pre-trained CNN model is applied to a new but related task (Gu et al 2021; Salimans et al 2016). According to Nigam et al (2023), TL should be favoured over training CNNs from scratch, as it uses less time and training data. Similar conclusions about building models with TL are also reported by many other researchers (Suh et al 2018; Lee et al 2016; Mohanty et al 2016; Gadiraju and Vatsavai 2023). Because of these advantages, as many other fields, TL has been one of the hottest research topics in various agricultural sectors (Espejo-Garcia et al 2021, 2020; Olsen et al 2019; Too et al 2019; Jiang et al 2021).

Despite TL being one of the hottest research areas and significantly contributing to agriculture, a comprehensive study of ‘TL in agriculture’ is lacking. However, reviews on TL have been found in other areas such as software defect prediction (Nam and Kim 2015), sentiment classification (Wang and Mahadevan 2011), activity recognition (Cook et al 2013), smart buildings (Pinto et al 2022), electroencephalogram signal analysis (Wan et al 2021), and electricity demand response (Peirelinck et al 2022). Though some survey papers on agriculture have been published recently, to the best of our knowledge, none focus broadly on TL. For instance, Kamilaris and Prenafeta-Boldú (2018) presented a survey of 40 papers on DL in agriculture, Attri et al (2023) published a study of 129 papers on DL in agriculture covering five areas of agriculture, and Barbedo (2023) reviewed the combined DL techniques with proximal hyperspectral images in agriculture. The comparative study of the Internet of Things (IoT) and DL is presented by Saranya et al (2023). Wang et al (2021) surveyed the use of DL for analysing hyperspectral images in agriculture, and Hossen et al (2023) provided a systematic review of AI in agriculture.

Hence, there is an essential need for a review of TL in the agricultural domain to provide a comprehensive understanding. Such a review would offer valuable insights for future advancements, guiding both researchers and practitioners in effectively utilising TL to improve agricultural productivity. Therefore, realising the importance of it, we intend to write this review paper.

However, given the vastness of the agricultural field, we refine our scope to plant-centric areas for a concentrated examination. Consequently, our study covers seven key fields, including plant species recognition, plant disease recognition, crop and weed detection, seedling detection, pest recognition, plant nutrient deficiency detection, and plant growth prediction. This focused approach enables an in-depth understanding of TL’s role in advancing each of these key applications.

To provide valuable insights, we first formulate the overall research objective (Table 1) using the Goal-Question-Metric formulation (Caldiera and Rombach 1994). Then, we outline the research questions derived from the overall goal. For each research question, we also provide its primary objectives. The outlined research questions, along with their primary objectives are described below:

Table 1 Goal of the research
  • RQ1: What CNN models are utilised? This question aims to identify the specific CNN models employed in the selected articles, providing an overview of the architectures most commonly used across studies.

  • RQ2: What are the sources of datasets? This question seeks to identify the origins of the datasets used in the selected articles, distinguishing between publicly available datasets and those collected by the authors. Additionally, it examines whether self-collected datasets are made accessible to the research community.

  • RQ3: What type of input images are used? The objective of this question is to examine the types of images used for the experiment, identifying whether they use RGB, hyperspectral, or other image formats.

  • RQ4: What platforms are used for the implementation? The aim of this question is to identify the platforms or framework utilised in the studies for implementation, especially noting if TensorFlow, Pytorch, MATLAB, or other platforms are employed.

  • RQ5: What types of TL are utilised? This question investigates the specific types of TL approaches applied in the studies, categorising them as model-based, instance-based, relational-based, and feature-based.

To address these research questions, we first describe the background of the review, focusing on TL, and its applications in agriculture. Next, we search and select the relevant articles following the guidelines set by Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) (Page et al 2021) methodology. After that, we describe the selected articles and critically analyse their contents to find the answers to the research questions. Based on our findings, the majority of the studies utilise different versions of VGG (Simonyan and Zisserman 2014), ResNet (He et al 2016), and AlexNet (Krizhevsky et al 2012) pre-trained models. These studies primarily rely on self-collected datasets, although some use publicly available benchmark datasets. The data types are mostly in RGB format. TensorFlow, and PyTorch frameworks are predominantly used for implementation with model-based TL being the most commonly employed approach. Besides, we identify key challenges and provide recommendations for future research directions. The key contributions of the review are:

  1. 1.

    An in-depth background study on TL, and its applications in agriculture.

  2. 2.

    A comprehensive examination of the CNN models utilised, datasets, input image types, implementation platforms, and TL approaches in agriculture, ensuring alignment with the established research questions.

  3. 3.

    Identification of the challenges faced by the existing studies, and providing suggestions for future research directions.

The remainder of the paper is organised as follows: In Sect. 2, we provide the background study on TL and its applications in agriculture. In Sect. 3, we describe the methodology used for the paper selection. Section 4 elaborates on the selected studies. In Sect. 5, we discuss answers to the research questions, highlight challenges, and offer recommendations for future research directions. In Sect. 6, we address potential threats to the validity of this review. Finally, Sect. 7 offers a concluding summary of the study.

2 Background study

In this section, we present a background study on TL and agriculture. First, we provide a comprehensive overview of TL, covering its definitions, key differences from traditional machine learning (ML), evolution, advances, and categories. We then illustrate the need for TL in agriculture and discuss the various agricultural applications that utilise TL.

2.1 Definitions of transfer learning

TL can be defined from various perspectives. According to Pan and Yang (2009), it can be conceptually defined with three concepts: “Task”, “Domain”, and "Task and Domain". The three distinct definitions are illustrated below and to facilitate understanding, the notations used to describe the definitions are shown in Table 2.

  • Definition 1 domain: A domain comprises two elements: a marginal probability distribution \(P(X)\) and a feature space \(X\) where \(X = \{x_1, \ldots , x_n \} \in \chi\). For instance, if the learning task is plant growth prediction, modelled as a regression problem, then \(X\) is the space of affecting variables (such as climate, soil, fertiliser), while \(x_i\) denotes the \(i\)-th influencing variables in \(X\) of a particular learning sample.

  • Definition 2 task: A task comprises two elements, a label space \(Y\) and an objective projecting function \(f(.)\) represented as \(\tau = \{ Y, f(.) \}\) that are learned from training data, denoted by a pair \(\{ x_i, y_i\}\), where \(x_i \in X\) and \(y_i \in Y\). The function \(f(.)\) is used to estimate the conditional probability \(P(Y|X)\) and to predict the corresponding label of new occurrence \(x\).

  • Definition 3 task and domain: Given learning tasks \(\tau _s\), a source domain \(D_S\) and a target task \(\tau _T\), target domain \(D_T\), TL aims to enhance the learning of the target estimate function \(f(.)\) in \(D_T\) utilising the knowledge learned in \(D_S\) and \(\tau _s\), where \(\tau _s \ne \tau _T\) and \(D_S \ne D_T\).

Table 2 Notations used to describe transfer learning

2.2 Key differences, evolution, and advances

Below, we provide an exploration of key differences between TL and traditional ML techniques, outline the evolution, emphasising significant milestones over the years, and discuss recent advances, focusing on the rise of deep TL (DTL) methods and their growing impact across various fields.

The differences between traditional ML and TL are presented in Fig. 1. As illustrated in Fig. 1a, traditional ML methods involve training each task separately from scratch, with both the source and target tasks being treated as equally important starting points for the learning processes. In contrast, TL fundamentally differs by utilising knowledge obtained from one or more source tasks (as shown in Fig. 1b) to enhance the performance of the target task. This approach distinguishes itself by not requiring a fresh start from the beginning of each task. Therefore, training needs significantly less time and effort than traditional ML. Additionally, TL alleviates the need for extensive labelled data, making it a valuable approach in various research fields, particularly in agriculture. Given these significant advantages, the development of TL has rapidly advanced across multiple domains.

Fig. 1
figure 1

Difference between traditional machine learning and transfer learning

To provide further insight into the evolution of TL, we have included a roadmap in Fig. 2 which illustrates its progression from the pre-DL era to advanced TL. As shown in Fig. 2, from the 1960 s to the 2000 s, this period is known as the pre-DL era when fundamental ML algorithms such as decision tree (DT) and support vector machine (SVM) were developed (Mitchell 1997). The early 2000 s marked the rise of DL, with innovations like CNNs and RNNs driving significant advances, notably demonstrated by AlexNet’s performance in 2012 (Krizhevsky et al 2012). The 2010 s witnessed the advent of TL, highlighted by the development of VGG and ResNet architectures (Simonyan and Zisserman 2014; He et al 2016). By the late 2010 s and early 2020 s, TL became integral to DL practices (Devlin et al 2019; Brown et al 2020). During the same time frame, DTL, a specialised approach of TL, emerged by leveraging deep neural networks to capture complex, hierarchical feature representations from data (Jiang et al 2021; Nigam et al 2023; Gadiraju and Vatsavai 2023). To extract features, along with multiple layers of neurons, it uses large datasets such as ImageNet (Deng et al 2009).

Recently, DTL has also demonstrated notable advancement in agricultural applications. For instance, Nigam et al (2023) and Chen et al (2020) showcase the effectiveness of DTL over traditional methods for disease recognition. Additionally, DTL has proven effective in enhancing plant species classification (Jiang et al 2021), weed identification (Chen et al 2022) growth prediction Gadiraju and Vatsavai (2023). Furthermore, DTL addresses challenges in specialised imaging contexts, as illustrated by Feng et al (2021) and Hu et al (2023), which improved disease detection and data processing in hyperspectral and polarimetric imaging, respectively. Further insights into DTL can be found in Yu et al (2022) and Tan et al (2018).

Fig. 2
figure 2

Road-map of transfer learning evolution from deep learning

2.3 Types of transfer learning

TL can be categorised based on the diverse combinations of source and target tasks, domains, and solutions adopted. According to Pan et al (2010), there are four categories of TL which are (i) model-based, (ii) instance-based, (iii) relational-based and (iv) feature-based. A brief description of these categories is provided below:

  • Model-based: Model-based TL discovers parameters in one domain and reuses discovered parameters in a target domain. This process creates a model and transfers shared parameters among models. The transfer of knowledge is achieved either by a regularisation term to prevent overfitting (Yang et al 2007; Simonyan and Zisserman 2014) or by target-domain model sharing components or parameters (Bonilla et al 2007; Fei-Fei et al 2006). Model-based TL again has three sub-categories, which are a) fine-tuning, b) self-training, and c) transformer-based mechanism (Yu et al 2022). Fine-tuning (the most commonly used) models fine-tune source-domain parameters to obtain good performance in the target domain (Rozantsev et al 2018; Guo et al 2019; Li and Zhang 2021). Fine-tuning utilises weight parameters trained by others; the last convolution and fully connected layers are not frozen but are fine-tuned and trained these two layers’ parameters. Self-training overcomes the weaknesses of fine-tuning by increasing data annotation and enhancement (Xie et al 2020; Zoph et al 2020). And lastly, the transformer-based model employs an attention mechanism in image recognition (Chen et al 2021; Yang et al 2023; Xu et al 2021).

  • Instance-based: Instance-based TL re-weights the labelled data for the target domain and performs the transfer by sharing different weights to the different instances. A meaningful way is by assigning the ratio of target, and the source instances as sample weights (Dai et al 2007; Yao and Doretto 2010). Kernel means matching is another approach that matches the source and target domain instances by reproducing kernel space (Huang et al 2006).

  • Relational-based: Relational-based TL maps knowledge between source and target domains and accomplishes transfer by modelling a logical relationship between them. It is believed that there is a typical pattern in the logical relationships between the source and target domain. Hence, the rules learned from the source domain can be transferred to target domains (Gu et al 2021).

  • Feature-based: To reduce the difference between source and target domains, Feature-based TL selects a good set of features. It performs transfer by constructing the features of different domains. One approach is statistical feature transformation (Pan et al 2010; Wang et al 2018), which minimises the distribution of differences between source and target domains using statistical approaches. Another notable approach is geometric feature transformation (Gong et al 2012; Sun et al 2016) that implicitly aligns feature spaces between the source and target domain by transforming features.

Furthermore, a summary of four categories of TL is provided below, as outlined in Table 3. For further in-dept information, interested researchers can refer to Pan and Yang (2009) and Wang et al (2021).

Table 3 Summary of different types of transfer learning

2.4 Advancing agricultural applications with transfer learning

Smart farming plays a crucial role in addressing the challenges of agricultural production, including food security, productivity, environmental safety and sustainability (Gebbers and Adamchuk 2010). As the global population continues to grow, increasing food production substantially is essential (FAO 2009). This must be achieved while maintaining food quality, preserving the natural ecosystem, and implementing sustainable farming practices.

To meet these demands, understanding the complex, multivariate, and unpredictable nature of agricultural ecosystems is essential, achieved through continuous monitoring, measurement, and analysis. This requires analysing large-scale agricultural data and integrating advanced technologies (Kamilaris et al 2017). For example, in this regard, remote sensing technology offers valuable insights for precision agriculture using satellites and unmanned aerial vehicles to collect data from vast geographical areas (Bastiaanssen et al 2000).

Despite advancements in agricultural technology, significant challenges remain, particularly regarding data scarcity. Agricultural data, unlike in other fields, is often limited, and obtaining high-quality, labelled data is costly, time-consuming, and requires specialised knowledge. Additionally, environmental variability, such as differences in weather, soil, and crop species makes it difficult for models trained on one dataset to generalise across different conditions. Limited data also come with the added complexity of inter-class variance and intra-class similarity, posing further challenges.

Given these realities, there is a pressing need for approaches that can maximise learning from limited data. TL addresses this issue by allowing models to leverage pre-trained knowledge from related domains, dramatically reducing the dependence on large agricultural datasets. This approach enhances both the adaptability and effectiveness of technological solutions in agriculture, ultimately supporting the increased production required to meet the demands of a growing global population.

Thanks to these advantages, various fields within agriculture, such as plant species recognition, plant disease recognition, crop-weed detection, seedling detection, pest recognition, plant nutrient deficiency detection and plant growth prediction, are increasingly utilising TL. Below we explore these fields in detail.

To enhance clarity, we first present a general block diagram (Fig. 3) outlining the TL approach across these fields, facilitating a clearer understanding before exploring the specific studies. As illustrated in the figure, TL-based workflow in agriculture consists of two primary domains (source and target). The source domain refers to the dataset or problems from which knowledge is transferred, typically characterised by having large labelled data with well-defined features, for instance, ImageNet (Deng et al 2009). In contrast, the target domain is specific to agriculture and typically consists of smaller datasets. To train the model with limited available agriculture data, the model leverages the knowledge learned from the source domain. Before training, the target domain dataset undergoes data preparation and preprocessing. It is worth mentioning that not every method incorporates all the steps depicted in Fig. 3. Nevertheless, we outline the general overview of TL-based applications in agriculture to facilitate a comprehensive understanding of the typical processes involved.

Fig. 3
figure 3

General workflow of transfer learning in agriculture. In this approach, a pre-trained CNN model is first trained on a large, general-purpose image dataset, commonly ImageNet (Deng et al 2009). This model is then adopted to the target domain, which consists of agricultural data

Plant species recognition: Plants are vital for biodiversity conservation and protection (Kaya et al 2019). However, many plant species have already vanished due to natural disasters and pollution, others are threatened with extinction, and many others are yet to be discovered (Kaya et al 2019). Therefore, it is essential to protect plants from losing their existence. However, precision in plant safety depends on automated plant species recognition systems (Wagle et al 2021), which refers to the identification and classification of plants based on their morphological characteristics such as leaves, stems, flowers and so on. Hence, it is essential for the analysis of endangered species monitoring, and ecological balance (Wäldchen et al 2018). It is also useful for botanists, agricultural producers, and food engineers.

Plant disease recognition: Plant disease recognition is the process of identifying and diagnosing diseases in plants through visual analysis. Detecting plant diseases is important as they have a devastating impact on the welfare of food production and cause a significant reduction of both quantity and quality (Chen et al 2020; Coulibaly et al 2019). However, traditional methods are often labour-intensive, time-consuming, and prone to error. Recently, the rise of image processing and TL has transformed automatic disease management, enabling accurate detection and diagnosis of plant diseases through analysis of leaf colour, texture, and shape–key indicators of diseases. Automation in plant disease recognition is not only useful for general populations but also for experienced ecologists and botanists.

Crop-weed detection: Crop-weed detection is the process of identifying and distinguishing crops from weeds. This differentiation is crucial for efficient weed management as it allows targeted herbicide application and robotic weeding, which reduces chemical usage, and enhances crop health. These targeted and minimal herbicide use ultimately promotes sustainable farming practices (Espejo-Garcia et al 2020).

Seedling detection: Plant seedlings are young plants that have recently germinated from seed, typically in their early growth stages with initial leaves. Providing early care is crucial for their survival and healthy growth. Accurate detection and classification of seedlings are essential for optimising crop management and ensuring healthy growth, allowing precise monitoring and timely interventions. This area has gained significant interest from researchers due to its importance in nurturing seedlings into mature plants.

Pest recognition: Plant pests refer to mites and insects that harm products and crops (Dawei et al 2019). There are many varieties of pests with fast reproduction, wide distribution, and large quantities that cause genuine losses to crops (Dawei et al 2019; Thenmozhi and Reddy 2019). The correct identification and classification of pests manually by farmers is a difficult task and they need an accurate automatic system.

Plant nutrient deficiency detection: A plant nutrient deficiency detection is the process of identifying and diagnosing plants that lack essential nutrients required for healthy growth. Plants display distinct visual symptoms when nutrients such as nitrogen, phosphorus, potassium, calcium or trace elements are insufficient. These symptoms reflect the plant’s inability to carry out vital functions due to the missing nutrients. Early detection of nutrient deficiency is critical as it allows for timely interventions for nutrient balance (Shadrach et al 2023). An automatic plant nutrient deficiency diagnosis can assist in detecting it early. It can play a major role in preventing significant agricultural losses and boosting yield while saving the environment through proper fertiliser usage (Espejo-Garcia et al 2022) Therefore, research on automatic plant nutrient deficiency diagnosis has become a hot topic for researchers (Yan et al 2022).

Plant growth prediction: Plant growth prediction is the process of forecasting how plants will develop over time, focusing on aspects such as height, biomass, leaf area, or overall yield. This involves using models and data for environmental factors like soil quality, water availability, light, and temperature. Such predictions enable farmers and agronomists to make proactive decisions on irrigation and fertilisation, ultimately improving crop productivity and sustainability in agriculture (Roy and Bhaduri 2022).

3 Methodology

Our review follows the guidelines set by the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) (Page et al 2021) approach. This includes defining a comprehensive article search strategy, establishing inclusion–exclusion criteria, and outlining article selection and data extraction processes. Details of these processes, adhering to PRISMA standards, are presented below.

3.1 Articles search strategy

We use an automated search and a snowballing search. These strategies are initiated by the two authors in June 2024 and reviewed by our two supervisors to ensure no studies were overlooked. The two search strategies are explained below.

  • Automated search: An automated search is conducted across seven databases (IEEE Xplore Digital Library, Wiley, ScienceDirect, MDPI, Springer, Frontiers and ACM Digital Library), using a Boolean search string. The primary search string used is “transfer learning" AND (“plant species recognition" OR “plant disease recognition" OR “plant seedling detection" OR “plant pest recognition" OR “plant nutrient deficiency detection" OR “crop-weed detection” OR “plant growth prediction”). These specific search terms are selected to ensure comprehensive coverage of the relevant studies by concentrating on the targeted areas within agriculture.

  • Snowballing search: To address the possibility of relevant studies being missed by automated search, we incorporated backward and forward snowballing (Wohlin 2014) techniques, which involve reviewing a paper’s references (backward snowballing) and the papers that cite it (forward snowballing).

3.2 Inclusion and exclusion

The following inclusion and exclusion criteria are applied:

  • Inclusion criteria: I1. The title or abstract explicitly states that the article is related to TL in Agriculture. I2. The paper is in the selected areas of agriculture. I3. The paper is peer-reviewed. I4. The paper is a regular or full paper. I5. The full text of the paper is available.

  • Exclusion criteria: E1. Articles not written in English. E2. Papers that are not peer-reviewed such as technical reports, short articles, posters, and preprints. E3. The papers do not contain the details of TL E4. The papers are not in the selected fields of agriculture.

3.3 Article selection and data extraction

The article inclusion process using the PRISMA method is illustrated in Fig. 4. As shown, an automated search across seven databases yielded 835 articles. After title and abstract screening, we excluded 772 articles, leaving 63 articles for the initial analysis. Additionally, we have found 24 articles using snowballing search, of which 13 are duplicates, resulting in 11 new inclusions. In total, 74 articles are ultimately selected for detailed review and analysis. Among the 74 selected articles 5, 28, 16, 12, 3, 5, and 1 are obtained from IEEE Digital Library, ScienceDirect, MDPI, Springer, Wiley, Frontiers and ACM, respectively.

Fig. 4
figure 4

PRISMA flow diagram for articles selection process

Table 4 Data extraction form

4 Existing applications

In this section, we review the studies that utilised TL in agriculture, particularly within plant species recognition, plant disease recognition, crop-weed detection, seedling detection, pest recognition, plant nutrient deficiency detection, and plant growth prediction.

4.1 Plant species recognition

In this subsection, we discuss articles that apply TL for plant species recognition, providing a summary of these studies in Table 5.

Table 5 Summary of recent research on plant species recognition

Reyes et al (2015) introduced a CNN model on 1.8 million images and applied fine-tuning to adapt its learned recognition abilities from general domains to the specific plant identification task. Later using the same dataset, Ghazi et al (2017) introduced a method utilising VGG16 and achieved an accuracy of 78.44%. Using a fine-tuned ResNet50, Joshi et al (2021) proposed an approach that achieved noticeable performance on three publicly available datasets (Flavia, LeafSpan, and MK-D1).

Remote sensing-based crop classification using the TL strategy of pre-trained networks was suggested by Gadiraju and Vatsavai (2023), who highlighted several challenges of using DL from scratch and discussed various TL methodologies to overcome those challenges. A framework for fine-grained phenotype classification based on bilinear CNN and TL was proposed by Yuan et al (2022). The framework comprises three steps: preprocessing, extracting features using symmetric VGG16, and merging extracted features for classification. During experiments, the method achieved accuracy and recall rates of 98.15% and 98%, respectively. Kaya et al (2019) designed and implemented five different classification models, including the end-to-end CNN model, by applying four strategies of TL on four datasets. The experimental results show that TL can provide significant advantages for automated plant classification.

Wagle et al (2021) proposed detection and classification utilising developed compact CNN and AlexNet with TL for classifying nine species of the PlantVillage dataset. The authors developed three models named N1, N2, and N3 and experiments were made on these models along with AlexNet. Compared to AlexNet, N1, N2, and N3 required 34.58%, 18.25%, and 20.23% less training time, respectively. In terms of size, N1 (14.8 MB) and N3 were 93.67% compact, and N3 (85.29 MB) was 85.29% compact. Fine-tuned MobileNetV2 and SVM on selected features were applied to classify five classes of grapevine leaves by Koklu et al (2022), who experimented with three types of implementations of TL. First, fine-tuned MobileNetV2 was used for classification; second, pre-trained MobileNetV2 was used for extracting features, and SVM was for classification based on the extracted features; third, features extracted with MobileNetV2 was selected with a Chi-squire approach and then used with SVM for classification. Based on the experimentation, the third approach showed better performance. Using TL, sago palm, one of the priority goods of Indonesia was recognised by Letsoin et al (2022) from UAV RGB image. The special concentration of this paper was to apply TL using three pre-trained models, including ResNet50, AlexNet and SqueezeNet, on a dataset collected from nine different groups of sago palm trees based on the physical features such as fruits, trunks, leaves and flowers. Based on the experiment result, ResNet50 became superior among others.

Ramos-Ospina et al (2023) studied Computer Vision and TL to classify phosphorus deficiency levels in maize leaves, achieving 96% accuracy with the DenseNet201 model. Their methodology involved creating a database of leaf images and evaluating various models. Chaity and Aardt (2024) utilised CNN and SVM for species classification in complex forest environments. The research revealed that classification accuracy ranged from 50% to 84%, offering insights into data collection for improved performance.

Yang et al (2022) introduced PlantNet, a fine-grained plant recognition network using TL and bilinear CNNs. It confirmed its validity on a public dataset named Arabidopsis and achieved 97.25% accuracy. Li et al (2024) introduced a TL-based model to enhance plant leaf classification by reducing image deformation. Zhang et al (2022) suggested TL-based to improve classification performance on small leaf datasets by utilising hybrid TL (model-based and instance) to modify ResNet50. Sawarkar et al (2024) explored bamboo plant classification through DTL combined with a majority multiclass voting algorithm to enhance accuracy.

Overall, recent advancements in plant species recognition through TL reveals impressive performance, with models achieving up to 99.73% accuracy as demonstrated by Kaya et al (2019). The effectiveness of different TL approaches, including instance-based and model-based strategies, highlights the successes achieved in this field. These developments are closely aligned with the overarching trends discussed in Sect. 5, which emphasise the importance of refining TL technologies like remote sensing and IoT. To maximise the potential of TL, future research must address computational efficiency and scalability challenges. This progress not only enhances the improvement of plant species classification but also broadens applications in the field.

4.2 Plant disease recognition

This subsection discusses the selected articles on plant disease detection and Table 6 summarises an overview of recent TL research on plant disease recognition.

Table 6 Summary of research on plant disease recognition

Mohanty et al (2016) used GoogleNet on the publicly available dataset PlantVillage, achieving 99.35% accuracy. Liu et al (2017) presented an AlexNet-based CNN model for identifying apple leaf diseases, with a reduced parameter of standard AlexNet, and achieved an accuracy of 97.62%. Wang et al (2017) introduced a method using VGG16 and achieved an accuracy of 90.4% on a public dataset (PlantVillage dataset). Fuentes et al (2017) proposed an algorithm for disease detection using ResNet50, and experiment in their collected dataset.

A comparative study using TL with a significant number of pre-trained networks has been proposed by Daphal and Koli (2023). This can automatically extract the unique foliar disease features and classify them into five classes, achieving an accuracy of 84% with MobileNetV2 and 86.93% with the ensemble model. Coulibaly et al (2019) introduced a TL-based system for identifying mildew in pearl millet, overcoming the challenges of data storage and achieving 95% accuracy. Özden (2021) utilised MobileNetV3 for apple disease classification, achieving an accuracy of 91% with RBG images. They fine-tuned MobileNetV2 for the classification task. TL for recognising versatile plant diseases with a small amount of data was proposed by Xu et al (2022). They trained their model with the PlantCLEF2022 dataset using self-supervised learning and a vision transformer (ViT) model. Besides, dual TL was utilised to save cost considering the large-scale dataset and validated it with 12 plant disease-related datasets and achieved an accuracy of 86.29%.

Simhadri and Kondaveeti (2023) examined the application of TL using 15 pre-trained CNN models with the aim of automatic rice disease detection. The results revealed that the InceptionV3 model was the most effective and achieved an accuracy of 99.64%. In Krishnamoorthy et al (2021), the InceptionResNetV2 model was utilised with TL for predicting rice leaf disease. The parameters of the proposed model were optimised for the classification and achieved an accuracy of 95.67%. Tomato leaf disease identification using compact CNN, TL to extract deep features, and hybrid feature selection to produce a comprehensive features set was proposed by Attallah (2023). Six classifiers were used in the tomato leaf illness classification, and the results showed that SVM and KNN attained the highest accuracy of 99.90% and 99.92%, respectively, using 24 and 22 features. Chen et al (2020) studied TL for the classification of plant leaf diseases using a pre-trained model (VGGNet and Inception Module) and then transferred the model to their own dataset by initialising weights. The proposed approach attained a validation accuracy of greater than 91.83% on a public dataset.

Dogra et al (2023) presented a CNN and VGG19 model based on TL for precisely classifying and identifying rice leaf brown spot diseases. They obtained an accuracy of 93% and a precision of 92.4%. Identification of leaf mild, cucumber mildew and tomato powdery mildew against complex and simple backgrounds was suggested by Zhao et al (2022). They obtained rapid and accurate vegetable disease identification. Their proposed model reached a precision score of 97.24%. Plant leaf image disease identification based on feature fusion and TL was proposed by Fan et al (2022). Extensive experiments were conducted on public datasets to validate the efficiency of the proposed method. The highest accuracy among various experiment settings was achieved by combining image augmentation, deep feature, histogram of gradient (HOG) feature, and discriminating feature. The proposed method achieved a high classification accuracy of 99.79%.

Abbas et al (2021) developed an approach for tomato disease detection using conditional GAN for synthetic image generation and trained DenseNet121 using both real and synthetic images utilising TL to classify ten types of tomato leave diseases. The proposed method was trained and tested on the PlantVillage public dataset and attained an accuracy of 97.11%, 98.65% and 99.51% for 10, 7, and 5 classes, respectively. Feng et al (2023) designed a method for online peanut leaf disease recognition based on deep TL and data balance algorithm. The data balance algorithm addressed distribution problems and TL was used to construct a peanut leaf disease classification model to improve generalisation capability based on lightweight CNN by eliminating the original network output layer, modifying the fully connected layer, re-adding pooling and normalisation layers and introducing regularisation constant strategies. Gu et al (2021) proposed a method for plant disease diagnosis and pest identification using deep feature-based TL. VGG19, VGG16 and ResNet models were used, and the highest accuracy was attained with ResNet50. A method for soybean leaf disease recognition was proposed by Yu et al (2023), achieving an accuracy of 99.53%; the model size was 42.70 MB, and the average time for recognition was 0.047184 s.

An automated soybean leaf disease classifier based on TL was suggested by Bevers et al (2022). The models were trained using 9500 field images collected from various parts of the USA on eight distinct deficiency and disease classes. The highest results (accuracy 96.8%) were found with the DenseNet-201 model. Nigam et al (2023) presented a model for identifying wheat crops using EfficientNetB4, which achieves an accuracy of 99.53% with RGB images by fine-tuning the model. Feng et al (2021) explored TL for detecting rice leaf disease using hyperspectral imaging. fine-tuning was performed to get the best accuracy. Their study demonstrated that hyperspectral images with TL can improve cost-effective disease detection. Jiang et al (2021) presented an algorithm for recognising rice and wheat leaf diseases with modified VGG16 based on multi-task learning. They created a dataset of two types of wheat diseases and three types of rice leaf diseases that comprised 40 images per disease and improved the dataset. The accuracy for wheat disease achieved at 98.75%, and for rice disease was 97.22%.

Shafik et al (2024) introduced two plant disease detection models named PDDNet-AE and PDDNet-Live, which integrated nine pre-trained CNNs for efficient disease classification. Testing on the PlantVillage dataset shows that PDDNet-Live achieved an accuracy of 97.79%, demonstrating significant improvements in plant disease detection and classification. Identification of rice disease was tackled using a SENet (an improved version of squeeze net) with TL from the PlantVillage database for improved feature extraction (Yuan et al 2024). The framework achieved an accuracy of 95.73%, demonstrating its effectiveness in precise disease classification. Detection and classification of rice leaf disease were presented using VGG16 and CNNs, and employing Gaussian filtering for preprocessing and clustering for segmentation, the model achieves an accuracy of 99.9% (Elakya and Manoranjitham 2024). The performance of VGG16 was compared with other methods such as InceptionV3, MobileNetV2, and ResNet-15, demonstrating its superior effectiveness in classifying rice leaf disease (Elakya and Manoranjitham 2024).

Radočaj et al (2024) presented a module named IncMB, which enhances CNNs with TL for tomato leaf disease detection, achieving 97.78% accuracy with a pre-trained InceptionV3 network. Srivastava and Meena (2024) proposed a TL-based framework using five CNN models: VGG16, Xception, InceptionV3, MobileNetV2, and DenseNet121. Ahmad et al (2021) introduced an approach to facilitate faster convergence, reducing overfitting, and prevent negative TL when transferring knowledge. Lanjewar et al (2024) proposed a potato leaf disease detection approach is by modifying VGG19, NasNetMobile, and DenseNet169 models to reduce trainable parameters and model size.

In summary, recent progress in plant disease recognition using TL has yielded high accuracy. For example, InceptionV3 achieved 99.64% accuracy for rice disease detection (Simhadri and Kondaveeti 2023), and DenseNet121 reached 99.51% for tomato disease classification (Abbas et al 2021). These results illustrate TL’s effectiveness in disease detection tasks. Studies have utilised both model-based and feature-based TL strategies. An accuracy of 95% for mildew detection was achieved by Coulibaly et al (2019). In contrast, 99.79% accuracy was reached by Fan et al (2022) through feature fusion methods, demonstrating that diagnostic performance is significantly enhanced by combining TL with advanced feature extraction. Future research should focus on optimising models for broader use and exploring new technologies like hyperspectral imaging and data augmentation to boost performance further. While TL methods are highly effective, ongoing refinements and innovations are needed to overcome limitations.

4.3 Crop-weed detection

This subsection, discusses research on crop-weed detection using TL is discussed, with a summary of proposed methods provided in Table 7.

Table 7 Summary of recent research on crop-weed detection

McCool et al (2017) introduced a method for crop weed detection by optimising the speed and accuracy of InceptionV3 and achieved an accuracy of 93.9%. Dang et al (2023) introduced an algorihm for crop-weed detection using YOLOV4. They tested their method with a self-collected dataset that they made publicly available for research community. Farooq et al (2022) proposed a cost-effective weed detection using YOLOV4-tiny, a fast, lightweight model suitable for low-quality devices. Bukhari et al (2024) utilised cross-domain TL for weed segmentation and mapping in precision farming by integrating ground and UAV imagery.

Chen et al (2022) presented a new, comprehensive benchmark crop-weed dataset collected from cotton fields in multiple southern United States. The authors evaluated 35 state-of-the-art pre-trained TL models with the collected dataset and established a comprehensive benchmark for the precise weed identification. An F1-score of 98.83% was achieved with ResNet101. Wang et al (2023b) proposed a fine-grained detection method for classifying weed and maize seedling detection based on a two-stage TL and Swin Transformer. Swin Transformer was used to learn the discriminative features of differentiating crops and weeds. Espejo-Garcia et al (2021) presented an approach for weed identification using TL and synthetic images generated by GANs. The proposed method was evaluated with a black nightshade dataset and tomato pictures. The highest performance (99.07% and 93.23% accuracy on the test set and noisy version of the same test set, respectively) was found with the Xception network and GAN-generated synthetic images.

Classification of volunteer potato and sugar beet under field conditions utilising TL was proposed by Suh et al (2018), who evaluated TL with three different implementation scenarios of AlexNet and compared the performance with five other pre-trained models including InceptionV3, ResNet101, GoogleNet, VGG19, and ResNet50. Abdalla et al (2019) assesses different approaches of TL using a VGG16-based encoder network and evaluated performance with a VGG19-based encoder network for ground-level oil seed rapes image with high-density weed, with and without data augmentation. The highest accuracy (96%) was found with the combination of a VGG16-based encoder and SVM classifier, and the processing time/image was less than 0.05 s, which was made it resealable to use in real-time. For semantic crop-weed segmentation, Bosilj et al (2020) suggested an approach using TL by examining datasets containing different types of crops and weeds and compared the performance of TL using DL from scratch. The comparison shows that TL can achieve better results and minimise training time by up to 80%. They have reported their approach achieved Cohen’s coefficient score of 98%. Espejo-Garcia et al (2020) proposed a crop-weed identification method combining fine-tuning pre-trained models and traditional ML classifiers. The results showed that the combination of finetuned DenseNet and SVM achieved the highest F1-score (99.29%).

Overall, recent progress in TL for crop-weed detection has demonstrated impressive results. Notable achievements include a 98.83% F1-score with ResNet (Chen et al 2022), 99.18% accuracy with Swin Transformer (Wang et al 2023b), and 99.07% accuracy with Xception and GANs (Abdalla et al 2019). Given these high-performance benchmarks, future research should focus on enhancing scalability of these advanced models and integrating emerging technologies such as GANs, further enhancing detection accuracy across diverse data under real-world field conditions.

4.4 Seedling detection

In this subsection, we describe the articles on seedling detection and the summary of seedling detection studies is summarised in Table 8.

Table 8 Summary of recent research on seedling detection studies

Hassan et al (2021) proposed an algorith to classify seedling using using TL. They achieved the highest result with VGG19. Gupta et al (2020) use five pre-trained CNN models (ResNet50, Xception, MObileNetV2, VGG16 and VGG19) to classify crop-weeds where ResNet50 achieve the highest result. Tseng et al (2022) put effort into tiny rice seedling detection in boisterous environments using TL and ML in UAV images. Two models, including EfficientDet, Faster R-CNN and one ML HOG-SVM, were employed for small rice seedling detection. The experiment showed that the combination of SVM and HOG descriptors provides the best F1-score of 99.9% and 99.6% in the training and testing sets, respectively.

TL-based plug seedlings system was proposed by Xiao et al (2019) where four CNN models (VGG16, ResNet, InceptionV3, AlexNet) were evaluated. Among the four models, VGG16 showed the best performance of accuracy at 95.5%, whereas AlexNet took the shortest time. Jiang et al (2019) developed a method to count cotton seedlings from 75 videos collected from multiple years and locations using a Faster R-CNN detection network. Several factors, such as TL efficiency, training sample size, and generalisability, were evaluated. The Faster R-CNN, along with Inception ResNetV2, achieved an F1-score of 96.9%, providing its effectiveness for seedling detection. Anuar et al (2022) explored several models to examine the best for detecting paddy seedlings using aerial imagery. Furthermore, fine-tuning TL models were also investigated. Their finding revealed that using a one-stage object detector named EfficientDetD1, an EfficientNet model, could detect paddy seedlings with 77% and 83% F1-score and precision, respectively.

Gao et al (2022) developed a YOLOV4-based method for detecting maize seedlings, using GhostNet, and k-means clustering to obtain high accuracy and low-level complexity and achieved an accuracy of 97.03%. Their model was fast, lightweight, and effective for maise seedling management. Islam et al (2024) used an improved R-CNN with ResNet101 to detect and segment lettuce seedlings from images. The model, trained with 1000 annotated images, achieved a 93% F1-score, demonstrating high performance in motoring seedling detection.

Overall, seedling detection has shown considerable progress. For instance, Tseng et al (2022) achieved a 99.9% using SVM with HOG descriptors for tiny rice seedlings, while Xiao et al (2019) reported 95.5% accuracy with VGG16 for plug detection. The Faster R-CNN with Inception achieved 96.9% F1-score for cotton seedling (Jiang et al 2019), while EfficientDetD1 had 77% F1-score for paddy seedlings (Anuar et al 2022). To further enhance seedling detection, future research should focus on optimising these models for varied seedling environments and growth stages while also integrating advanced techniques such as multi-modal data fusion (Gao et al 2022) to improve overall detection accuracy and robustness.

4.5 Pest recognition

In this section, we describe the articles on plant pest recognition which are summarised in Table 9.

Table 9 Summary of recent research on plant pest detection

Dawei et al (2019) proposed a diagnostic system using TL for pest recognition and detection that was able to train and test on ten different types of pests, and got an accuracy of 93.84%. The results of the TL-based approach were superior to human experts and traditional ML. Furthermore, to verify the generality of the proposed model, it was used to recognise two types of weeds, namely procumbent speedwell and Sisymbrium sophia, and achieved 98.92% accuracy.

Thenmozhi and Reddy (2019) applied TL with fine-tuned pre-trained models to classify insect species on publicly available datasets. The suggested approach was evaluated and compared with pre-trained models such as GoogleNet, VGGNet, ResNet and AlexNet for insect classification.

A modified lightweight TL model was proposed by Wang et al (2023a) utilising TL feature extractor networks such as InceptionV3, VGG16, MobileNetV3 and VGG16 and a single-shot multi-box detector for classification. For further time reduction, the prediction convolution kernel was miniaturised, and a residual block of 1 x 1 kernel was added.

A TL methodology based on VGG16 was implemented by Torky et al (2023) for recognising the Red Palm Weevils sound. The proposed method was trained and validated on a public dataset consisting of 731 infected and 1754 healthy samples. The proposed method achieved 95.5%, 93%, 95%, and 94% accuracy, recall, precision, and F1-score, respectively.

Costa et al (2023) demonstrated that TL improved pest detection accuracy and reduced training time with limited training samples. They suggested exploring meta-learning for better performance in scenarios with scarce data and highlighted the use of YOLOV5 for practical pest monitoring in smart farming.

TL greatly improved boll-weevil pest classification, by achieving an accuracy of 90.70% and 96.28% in scenarios with limited instances and features, respectively. This was achieved through advanced methods of measuring task similarity, adapting relevant features, and parameters, and integrating climate variables such as rainfall, humidity, and temperature (Toscano-Miranda et al 2024).

In pest recognition, TL approaches show varied performance; for instance, Thenmozhi and Reddy (2019) achieved 97.47% accuracy. YOLO and XGBoost also demonstrated strong performance in specific scenarios (Costa et al 2023; Toscano-Miranda et al 2024). Future research should focus on refining these existing methods for diverse pest types and conditions, incorporating advanced techniques like meta-learning, and few-shot learning to improve accuracy with limited data (Costa et al 2023).

4.6 Plant nutrient deficiency detection

Below we describe the studies on plant nutrient deficiency detection and Table 10 summarises an overview of TL research on plant nutrient deficiency detection.

Table 10 Summary of recent research on plant nutrient deficiency detection

Espejo-Garcia et al (2022) studied nutrient deficiency using TL. Two publicly available datasets containing images from real-world conditions were used to verify the proposed method’s performance. Image classification with fine-tuned EfficinetNetB4 obtained the highest accuracy of 98.52% and 98.65%, respectively, on two datasets.

Shadrach et al (2023) proposed an approach by applying Ring Toss Game Optimisation with deep TL-based nutrient deficiency classification known as RTGODTL-NDC, which involved preprocessing, segmentation, extracting features, tuning hyperparameters, and classification. The SqueezeNet model was used to extract features. Lastly, the RTGO algorithm was used to classify nutrient deficiency, and achieved 97.16% and 98.28% accuracy and specificity, respectively.

Xu et al (2020) explored the performance of TL for nutrient deficiency diagnosis in rice by fine-tuning four models namely NasNetLarge, ResNet50, InceptionV3, and DenseNet121. In their experiment, all the models achieved an accuracy of over 90%, but the DenseNet121 performed the best with a test accuracy of 97.44%.

Yan et al (2022) proposed a model named UMNet for plant nutrient deficiency diagnosis that used TL with fine-tuning MobileNetv3Large. The proposed method was compared with other state-of-the-art models and achieved the highest accuracy (97.8%), even better than the complex models such as VGG, Inception and others.

The above information highlights the effectiveness of various TL-based models. For instance, EfficientNetB4 achieved an accuracy of 98.65% (Espejo-Garcia et al 2022), and DenseNet121 reached 98.62% (Xu et al 2020), demonstrating high performance in identifying nutrient deficiencies. These results indicate that models leveraging TL and fine-tuning can achieve superior accuracy compared to traditional methods. Future research should continue to refine these models, potentially focusing on integrating hybrid approaches to further enhance detection capabilities.

4.7 Plant growth prediction

This subsection discusses works on plant growth prediction, and a summary of the discussion is shown in Table 11.

Table 11 Summary of recent research on plant growth prediction

Fukada et al (2023) proposed an algorithm to automatically predict the growth status of tomatoes using RGB images with the application of TL. The YOLOV5 model was employed and achieved a precision score of 72%.

Roy and Bhaduri (2022) presented a real-time object detection method to detect growth stages of leaves using the DenseYOLOV4 framework by modifying YOLOV4 by introducing DenseNet. The proposed model achieved an mAP of 96.20% and an F1-score of 93.61%.

Hari and Singh (2022) proposed an algorithm for predicting cucumber plant growth. Six different models were employed to evaluate their performance for the proposed model. The findings showed that among the six models, the VGG16 models attained the maximum test accuracy of 97.98%.

Choi et al (2021) presented the classification approach of the crop leaf growth conditions using hyperspectral images. Significant information on crop plants was learned through hyperspectral images, and preprocessing was applied to utilize this information. The experiment results showed 90.9% accuracy.

Recent research on plant growth prediction shows that DenseYOLOV4 achieved a high precision of 96.20% (Roy and Bhaduri 2022), and the VGG16 model reached 97.98% accuracy for cucumber growth (Hari and Singh 2022). Additionally, YOLOV5 also performed with a precision of 72% (Fukada et al 2023). Future work should focus on optimising these models and integrating them with other advanced techniques to enhance prediction capabilities and facilitate practical applications.

5 Results and discussion

In this section, we analyse the selected studies, assessing their relevance to each research question. Additionally, we discuss the challenges identified and offer recommendations for future research directions.

5.1 pre-trained CNN models (RQ1)

Based on the comprehensive study, we find that a broad range of pre-trained CNN models has been applied across the reviewed studies. These include AlexNet, VGGNet, GoogleNet, InceptionV3, ResNet, MobileNet, EfficientNet, EfficientDet, Faster R-CNN, VGG, DenseNet, YOLO, and InceptionResNet. Most studies evaluated multiple models independently to identify the best-performing approach. The findings show that VGG, AlexNet, and ResNet are the most frequently used models. DenseNet and MobileNet are also frequently used for balanced performance and accuracy. A few studies, for instance, Thenmozhi and Reddy (2019); Yan et al (2022) introduced their models, while others, such as (Roy and Bhaduri 2022; Zhao et al 2022; Chen et al 2020) utilised hybrid models by combining multiple architectures. For object detection, the YOLO model is frequently used for object detection tasks such as seedlings detection and crop-weed detection.

5.2 Datasets (RQ2)

Among the papers reviewed, the majority opted to create and use self-collected datasets, while a portion used publicly available datasets. Notably, some studies (for example, Wang et al (2023b); Bosilj et al (2020); Wagle et al (2021)) conducted experiments with a combination of both public and self-collected datasets. Additionally, a few studies (Gadiraju and Vatsavai 2023; Özden 2021; Simhadri and Kondaveeti 2023) reported the creation of new datasets by combining more than one public dataset to form a new dataset for their experiments. Despite the vast scope and significance of agriculture sectors, there remains a lack of sufficient publicly available datasets. To support researchers in navigating the available resources in these fields, we provide the details of publicly available datasets (Table 12), identified in our review. In addition to these datasets, researchers can find publicly available datasets in the works of Kamilaris and Prenafeta-Boldú (2018), Liu and Wang (2021), Hasan et al (2021), and Anuar et al (2022).

Table 12 Datasets used for various applications

5.3 Input type (RQ3)

Based on our findings, we have identified three distinct types of input images used in the reviewed studies. These three types are RGB, hyperspectral, and grayscale images. The predominant choice, adopted by more than three-quarters of the papers, is conducting experiments with RGB images. Only a few studies, such as (Feng et al 2021; Choi et al 2021) used hyperspectral images, likely due to the challenges associated with collecting hyperspectral data. Likewise, the use of grey images is also the same. Similarly, the use of grayscale images is limited, with studies such as (Xiao et al 2019; Thenmozhi and Reddy 2019) using this type. The use of video as input is even less common, with only one study (Anuar et al 2022), employing it. A list of references for each type of input is provided in Table 13.

Table 13 List of Input data types and corresponding paper references

5.4 Platforms (RQ4)

In our review, the analysis of platforms highlights four main preferences (TensorFlow, PyTorch MATLAB, and Caffe) across studies. Among these, TensorFlow emerged as the leading choice, likely due to its flexibility and extensive library support. Pytorch is another prominent platform, closely behind TensorFlow, with its ease of use, extensive libraries, and growing community support contributing to its widespread usage among researchers. MATLAB also appears frequently, used by various studies such as (Abdalla et al 2019; Suh et al 2018; Wagle et al 2021). The Caffe platform is used in one study (Liu et al 2017). Notably, one work (Chen et al 2020) reported the simultaneous use of two platforms (MATLAB and TensorFlow). A comprehensive summary of the platforms used in the papers assessed in this study is provided in Table 14.

Table 14 List of platforms and corresponding paper references

5.5 Transfer learning types (RQ5)

The list of references of different types of TL used across studies is provided in Table 15. Among the papers analysed, model-based TL is the most used. A notable portion of studies employed hybrid TL, combining more than one TL method. Within this hybrid category, five papers used the instance-based and features-based methods, and another two papers used combinations of instance and model-based TL. Additionally, feature-based TL was utilised in 5 papers, constituting nearly 6% of the total. The instance-based method was adopted in four papers, and relational-based TL was used in one paper. These results demonstrate that future research can utilise a model-based TL approach for their applications as it has been proven effective by many researchers.

Table 15 List of transfer learning (TL) types and corresponding paper references

5.6 Challenges of existing research

Lack of cross-dataset experiment: Most of the prior works have been evaluated on the same dataset, for example, Espejo-Garcia et al (2022), Mohanty et al (2016) and Ferentinos (2018). However, very few proposed works, such as Jiang et al (2019) and Al Sahili and Awad (2022), were tested with a cross-dataset. Particularly, Yoon et al (2023) tested their model with 12 different datasets. According to a survey by Kamilaris and Prenafeta-Boldú (2018), only 8 of 40 papers used cross-datasets to check how their models performed with unknown datasets. Hence, most of the existing algorithms proposed in different areas of agriculture did not apply to the real world, as stated in numerous papers. For instance, it is reported by Xu et al (2020) that a substantial drop in accuracy, with the trained model’s performance plummeting to 31.4% from the initially satisfactory level. Likewise, Xu et al (2022) reported that their method achieved an accuracy of 97.4% in the PlantVillage dataset while it scored only 63.8% in the IVADLTomato dataset. Furthermore, many other papers, such as Espejo-Garcia et al (2022) and (Shadrach et al 2023), also stated this fact. Therefore, a model trained with one dataset could be highly accurate but might not be robust for reliability in agricultural fields. This is because, the data distribution of agriculture varies due to the different growth stages, seasons, countries, species, image collection processes and so on.

Negative learning: The pre-trained models that were commonly used for TL in agriculture were not specific to this domain. Instead, they have been trained on generic computer vision datasets such as ImageNet. This poses a significant challenge as convolutional models convey features from low-level to high-level, potentially leading to negative transfer, as highlighted by Al Sahili and Awad (2022); Ahmad et al (2021); Jiang et al (2024). Rather than enhancing performance, it results in reduced effectiveness or accuracy. It happens when source and target domains are notably dissimilar or conflict with the target task.

Ignoring background: It is very common to account for the appearance of background in agricultural TL applications. However, most of the existing literature ignores this crucial issue as they captured the images under controlled scenarios (Barbedo 2018). Only a limited number of studies were conducted on addressing real-world datasets, most experiments were conducted on lab-based images. Moreover, training data in most cases did not fully capture how weather, soil and other agricultural practices may affect the robustness (Shadrach et al 2023). Consequently, most existing algorithms under complex backgrounds failed to effectively segment the target image from its background which lead to unreliable results, as discussed in (Chen et al 2020). Therefore, the TL models should be evaluated with field studies to ensure their generalisability and effectiveness in real-world agricultural applications.

Initialising irrelevant weights to target: Pre-trained models often exhibit differences in their weight configurations. For example, ResNet50 has a much greater number of layers compared to ResNet18 (Torky et al 2023). Furthermore, different types of target datasets or tasks require the initialisation of distinct weights. Initialising irrelevant weights for target tasks can reduce the performance of the proposed model as noted in Afonso et al (2019) and Yi et al (2020).

Fine-grained identification: Th classification of tiny objects (Tseng et al 2022) or fine-grained species, disease, pests, and so on in complex backgrounds is a formidable challenge (Wang et al 2017). These challenges arise from several factors. Firstly, the massive difference between classes and the visual characteristics between the same classes are quite distinct. Secondly, the different classes have striking similarities, further complicating the task. There are numerous detailed categorisations of biological sub-classes, subspecies of different fine-grained tasks, and nuance morphological similarities which lead to problems of fine-grained classification.

Imbalanced dataset: A significant disparity emerges in the training of pre-trained model weights and their application in the existing works. The weights of pre-trained models were typically trained with balanced data distribution whereas, in the existing works, most of the datasets exhibit class imbalance, which can lead to over-fitting problems primarily affecting small classes and restricting the generalisation performance of the models (Buda et al 2018; Yan and Han 2018). For example, Al Sahili and Awad (2022) reported that the accuracy of TL with ResNet50 dropped by 30% due to over-fitting caused by an imbalanced dataset. Several papers, such as, Al Sahili and Awad (2022), Feng et al (2023) and Espejo-Garcia et al (2022), have highlighted the dataset imbalance issue.

Model interpretation: TL models are often regarded as black boxes. This lack of transparency can make it challenging for researchers to understand why and how they make predictions. Effective TL often requires domain expertise to determine what aspects to transfer (Liu et al 2021), how to set parameters (Liu and Wang 2021), how to adjust the model properties and which layers to freeze. However, researchers often omitted technical implementation details or lacked full awareness of these implementations.

Domain shifting: Agricultural datasets often vary significantly due to differences in environmental conditions, crop types, and imaging techniques. This variability can lead to reduced performance in pre-trained models when applied to new agricultural tasks. For instance, a model trained on images of a crop under specific conditions may not perform well on images of another crop grown under different conditions (Singhal et al 2023).

5.7 Recommendation for future research

Solution of generalisation: The visual characteristics of plants may change over time and with the progression of crops due to varying temperatures and humidity. To enhance model generalisation, it is advisable to include a diverse set of training images. Therefore, a wide variety of photographs should be taken to account for these possibilities. Furthermore, an optimal, real-world applicable model should be produced by collecting images from diverse regions, including samples from other countries and captured at times of the year (Attallah 2023). Future research efforts should focus on the accurate detection of the region of interest and extend to the deep instance segmentation process in the presence of complex natural environments as shown in Bai et al (2022). Furthermore, databases across various agricultural domains in real-world conditions are still in their infancy (Liu and Wang 2021). Researchers should utilise input acquisition platforms such as aerial photography, auto-capture portable tools, IoT tools, and so on to expand the classification of farmland situations and make up for the lack of previous research. It would ensure not only the accuracy of the dataset but also enhance the comprehensiveness and generalizability of the algorithms, thus fostering advancement in the field.

Using pipeline could improve performance: As stated by Jiang et al (2021), forming a classification pipeline by retrieving deep features with several pre-trained models, combining the extracted features of different models, selecting appropriate features through hybrid selection, and finally using traditional ML could enhance accuracy and reduce computational cost. A similar suggestion was also provided by some other researchers, such as Fan et al (2022), who reported that a combination of TL-based feature extraction and the fusion of deep-learned features with traditional feature fusion methods, such as HOG, could be used to reduce training parameters and minimise the processing time.

Incorporating with generative adversarial network: Most aof the datasets available in agriculture typically do not comprise enough imagery in diverse environments, which is required for producing highly accurate models. These models in natural environments may overfit and perform poorly. Many researchers suggested using various data augmentation techniques, such as perspective and affine transformations. However, when the training images are inadequate, augmentation techniques cannot improve the outcome (Abbas et al 2021). Using TL with a GAN can produce notably counterfeit images in this case. GAN is proven to be perfect for small datasets by many researchers (Röglin et al 2022; Alauthman et al 2023; Hiruta et al 2022) and for imbalanced datasets (Sauber-Cole and Khoshgoftaar 2022; Sharma et al 2022). More suggestions to deal with imbalanced datasets are provided in the literature (Al Sahili and Awad 2022; Espejo-Garcia et al 2022).

Designing methods for real-time applications: Most of the existing applications of TL in agriculture use heavy pre-trained models such as VGG16, InceptionV3, InceptionResNetV2, and ResNet which are typically not suitable for exploitation on mobile platforms. However, applications that are not readily deployable for farmers may not be practical as access to laboratory resources is not always available to them. Hence, future researchers should prioritise designing it applicable in real-time. For this purpose, designing models similar to MobileNet, UMNet, ShuffleNet, EfficientNet and Exception, SqueezeNet, NasNetMobile, and Xception, which are lightweight in terms of storage requirement and training time, could be a better fit for real-time uses (Feng et al 2023).

Integrating with classification and segmentation approaches: Combining DL segmentation and detection methods such as fast R-CNN, Faster R-CNN, YOLO, U-Net, regional CNN (R-CNN), to detect and distinguish between and facilitate classification methods which have similar patterns would be an enhancement for future research (Attallah 2023).

Bridging domain gap: The challenge in TL for agriculture arises from the use of pre-trained models not tailored to the domain. Instead, these models originally trained on general Computer Vision datasets such as ImageNet and COCO, lack specificity for agricultural contexts (Al Sahili and Awad 2022). This poses a significant challenge, as convolutional models transition from low-level to high-level features, potentially yielding detrimental transfer effects. Therefore, to provide agricultural fields with domain-specific, generalised and robust pre-trained models in various agricultural fields, it is necessary to develop agricultural-based foundation models.

Leveraging few-shot and zero-shot learning: To address the unique challenges in agriculture, few-shot and zero-shot learning (FSL / ZSL) present promising approaches by allowing models to generalise from minimal or even no labelled training examples (Rezaei et al 2024). This is especially valuable in agriculture, where acquiring large, diverse labelled datasets is often impractical due to secessional and environmental variations. By leveraging FSL, models can quickly adapt to agricultural problems such as new crop diseases and pest infestations (Saad and Salman 2024), even with one sample (Saad and Salman 2024) or no sample (Singh and Sanodiya 2023). These approaches can support robust applications suited for the diverse and evolving agricultural landscape.

6 Threats to validity

In this section, we describe potential threats to the validity of this review, following guidelines from Zhou et al (2016) and Ampatzoglou et al (2020). Common validity threats are organised into categories relevant to our reviews. The categories include selection bias, data extraction bias, publication bias, generalisation bias and interpretation bias. Each of these categories is explored to identify possible limitations and ensure a balanced assessment of the review findings.

Selection bias: Selection bias can arise from the choice of databases, keywords, and filtering criteria, potentially excluding relevant studies. In this review, we aimed to mitigate this by using multiple academic databases, establishing comprehensive search terms, and adhering to inclusion /exclusion criteria. However, some relevant papers may still be inadvertently omitted.

Interpretation bias: Interpretation bias may influence how findings from reviewed articles are summarised or emphasised. We have sought to present a balanced view of TL application in agriculture, acknowledging both the benefits and limitations. Biases may still exist through inherent interpretation.

Publication bias: The emphasis on peer-reviewed literature can indeed introduce publication bias in agricultural research, where positive results are more likely to be published while negative or contradictory findings may be overlooked. Since we have reviewed only peer-reviewed published papers, our review is limited by this bias. To provide a more balanced perspective we suggest future work should consider grey literature which refers to the information that is not formally published through traditional publishing channels (Conn et al 2003; Rothstein and Hopewell 2009). It may include a wide range of materials such as government documents, research reports, theses and working papers.

Primary study generability bias: Primary study generability refers to the extent to which findings from individual studies can be applied beyond their specific context. In this review, we have narrowed our focus to seven subfields within agriculture. It may enhance the depth of analysis but could also limit the generalisability of our findings. Consequently, while our conclusions provide valuable insights, they may not represent broader agricultural practices and challenges.

7 Conclusions

The rapid growth of the global population challenges agriculture to produce sufficient food while dealing with environmental and socioeconomic obstacles. Embracing new technologies and innovative research is essential to accelerate agricultural production. However, limited agricultural data necessitates effective techniques for conducting agricultural research. TL has emerged as a promising strategy to mitigate data shortages by adapting pre-trained models. In this review, we study various aspects of TL, and its diverse applications within agriculture covering plant species recognition, disease classification, pest detection, seedling classification, plant nutrient deficiency detection, and plant growth detection. Despite the promising advantages of TL, its implementation in agriculture faces challenges due to the complexity and variability of agricultural data. To facilitate practical integration into agriculture, we have highlighted these challenges and provided recommendations for future research to advance agricultural studies, thereby contributing to sustainable food production for the growing population.