Original papers
A recognition of farming behavior method based on EPCI-LSTM model

https://doi.org/10.1016/j.compag.2021.106467Get rights and content

Highlight

  • Creatively apply deep learning algorithms to farming behavior classification.

  • A lightweight 3D attention mechanism is proposed, which can be connected to any network.

  • Experimental statistics show that EPCI-LSTM can distinguish farming behavior accurately.

Abstract

The definition of recognition of farming behavior is by embedding deep learning algorithms into industrial communication devices such as cameras to capture the farming actions of agricultural workers, and analyze their actions in the videos. Aiming at the problem, the lack of information about the agricultural workers' labor process in agricultural production, this paper proposes to realize the automatic classification of farming behavior by embedding deep learning algorithms in the cameras. In algorithm, as a result of insufficient calculation speed, timing duration and resolution of existing behavior recognition models, an efficient and lightweight 3D attention mechanism named embedded position coordinate information (EPCI attention) suitable for videos is proposed. In the experiment, EPCI attention is connected with ConvLSTM to form an end-to-end deep learning model EPCI-LSTM. The experimental comparison demonstrates EPCI-LSTM achieves clear improvement on FBD (Farming Behavior Dataset), a dataset containing 905 short videos involving 4 typical farming behaviors: spraying pesticides, hoeing the ground, weeding, and planting seedlings, against P3DConvLSTM and ConvLSTM by 2.58%, 6.93% in F1 score respectively. By connecting EPCI attention, the ability of ConvLSTM is significantly ameliorated with accuracy of 0.9441 and recall of 0.9485. EPCI-LSTM makes the absolute improvement over ConvLSTM by 5.85%, and 7.99% in precision and recall, and the training time is greatly reduced by 26.78%. It proves that EPCI-LSTM has advantages in visual recognition compared with ConvLSTM that is individually defined or optimized for recognition or generation. Therefore, the experimental statistics verifies the success of the EPCI attention structure, and the EPCI-LSTM network can successfully realize the efficient and accurate discrimination of the labor behavior of agricultural workers. It is of great significance to promote the digital and standardized management of farm workers by agricultural enterprises, and the further transformation of traditional agriculture to automated and smart agriculture.

Introduction

With the gradual expansion of the scale and digitization of the agricultural industry, the new generation of information technology is deeply integrated with agricultural production and management (Mudda et al., 2017). For this reason, consumers at all levels have an increasing demand for the high quality of agricultural products. Consumers pay more and more attention to the whole process of agricultural products from growth to production, including sensitive issues such as whether farmers use pesticides during planting crops(Wang, 2017). Therefore, the farmer's working behavior is indispensable information in the traceability of agricultural products. At current stage, agricultural firms have roughly two types of agricultural operation traceability systems in our country: one is direct image data, which occupies too much storage space, and the effective information cannot be retrieved. The second is that employees record agricultural production data and operations by themselves (Cao et al., 2017). However, it will lead to low accuracy and low utilization of the recorded data, which in turn will damage the economic reputation of the enterprises (Hamada, 2015). Therefore, in view of the poor efficiency of the existing traceability system, and difficulty in extracting effective information for data sharing, we propose automatic detection and effective identification of farming behavior of agricultural workers. After embedding the behavior recognition algorithm into the camera, and applying it to the field, it will provide effective farming information for the traceability system, and bring convenience for enterprises to manage workers to complete agricultural plans (Xu and Li, 2016).

Deep learning is a research field that has gradually emerged in recent years (Zhang et al., 2019). At present, deep learning algorithms have become mainstream research tools in the fields of image recognition, image classification, and target detection (Zhu et al., 2020). The recognition of farming behavior of agricultural workers is essentially a video classification problem in time and space. In addition, as industrial communication devices such as cameras are widely used in farmland scenes (Balasudarsun and Pranavaraj, 2018), it has become possible to apply the communication devices to embed deep learning algorithms to identify farming behavior in the rural field (Beheraa et al., 2015).

According to the recent advances in deep learning, a variety of high-performance deep learning algorithms are applied to multi researches on gesture and action recognition to achieve more accurate recognition of human action recognition (He et al., 2016a, Jo et al., 2016). Ji (Tran et al., 2015) et al proposed 3D CNN, which is one of the early works to directly learn the spatio-temporal representation of short video clips. Shi (Shi et al., 2015) et al. proposed the ConvLSTM network structure and successfully applied it to rainfall prediction. Qiu et al. used (1 * 3 * 3) convolutional filters (2D) on spatial domain plus (3 * 1 * 1) convolutions (1D) operated temporally to simulate (3 * 3 * 3) convolutions. Through simplification, Pseudo-3D Residual Net (Qiu et al., 2017) only adds a certain amount of 1D convolution compared to 2D convolutional network of the same depth, which effectively simplifies the calculation and complexity of pure 3D convolution (C3D).

However, these traditional convolutional neural networks focus on all the local information in the entire picture or the video data, which will cause the interference signal occupies a large position. In order to weaken these interference signals and enable the model to pay attention to more useful information, scholars from major institutions around the world have proposed many practical attention mechanisms (Wang et al., 2018). Dosovitskiy (Zhai et al., 2021)et al. applied self-attention transformers in the NLP (Collobert et al., 2011) field to the computer vision field, and proposed ViT (Vision Transformer). After the birth of ViT, scholars from worldwide have refreshed the list of many more difficult image recognition tasks based on ViT. Although ViT performs well, it requires a high computational cost. In addition to imitating self-attention, there is another class of members in the attention mechanism family, which are lightweight attention mechanisms like SE (Hu et al., 2020), BAM (Park et al., 2019), and CBAM (Woo et al., 2018). These attention mechanisms are often plug-and-play flexible modules in the networks. Although these attention mechanisms can improve the effect of the models in the field of image recognition, they have the shortcomings of ignoring location information or capturing incomplete location information. To tackle this problem, Hou (Hou et al., 2021) et al. proposed a new attention mechanism, Coordinate attention, which embeds location information into channel attention, achieved good results in the field of semantic segmentation. Although the excellent human behavior recognition algorithms have been applied to classify various daily behaviors such as sports and dance (Zhu et al., 2017) (Rautaray and Agrawal, 2015). However, due to the complexity of the field agriculture background, the existing algorithms are difficult to directly applied in the rural field (Li et al., 2019). So far, there has been no research on applying behavior recognition technology to the behavior detection of farmers in the field.

Therefore, this paper applies the deep learning algorithms to analyze the real agriculture workers video records, and automatically recognizes the agriculture workers farming behavior to provide effective information for the traceability systems. It is of great significance to promote the digital and standardized management of farm workers by agricultural enterprises, and the further transformation of traditional agriculture to automated and smart agriculture. In algorithm, an efficient and lightweight attention mechanism suitable for video is proposed to solve the shortcomings of the existing models of insufficient calculation speed, timing length and resolution. In the interests of studying the technology of behavior recognition or video understanding tasks, so as to improve the existing behavior recognition or video understanding models. Specifically, we extend the mobile network coordination attention mechanism to 3D, embed the channel attention mechanism of location information. Moreover, this 3D attention mechanism can be connected to any structure.

Section snippets

Data collection

In the past ten years, although many large-scale video datasets on action recognition have emerged, there is no special video datesets related to recognition of farming behavior, which cannot be used as the dataset. In this research, our dataset, called FBD (Farming Behavior Dateset), collected from public databases such as Bilibili, Tencent Video, Youku, and consisted of 71 videos related to farmers’ actions, which included four the most common and classic categories: Spraying pesticides,

Exploring 4 methods in P3DConvLSTM

In this section, various P3D blocks are used in the experiments. And the model which performs best on the test set will be selected for comparison with EPCI-LSTM.

To achieve the decomposition of a complex 3D convolution operation into a 2D operation and a 1D operation, this requires the design of two convolution kernels 3 × 3 × 1, 1 × 1 × 3. 3 × 3 × 1 convolutional kernel only performs 2D convolution operations at the spatial level, while 1 × 1 × 3 convolutional filters at the temporal level. In

Conclusions

In order to fill in the lack of information in the agricultural production process, this article proposes automated detection and identification of agricultural workers' labor behavior. And as communication devices such as cameras are widely used in rural scenes, it has become possible to identify farmer workers in the field in real-time by embedding deep learning algorithms in cameras. It is of great significance to promote the digital and standardized management of farm workers by

CRediT authorship contribution statement

Wenxin Zhao: Conceptualization, Methodology, Software, Validation, Writing – original draft. Xin Chen: Validation, Writing – review & editing, Supervision. Yiliang Li: Methodology, Validation, Visualization. Jinpo Xu: Data curation, Validation. Xiang Li: Conceptualization, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was supported by National Natural Science Foundation of China “Research on Distributed Real-time Complex Event Processing for Intelligent Greenhouse Internet of Things” (grant No. 61601471) and Special project for the construction of Hebei Agricultural Science and Technology Park and the Capital Modern Agricultural Science and Technology Demonstration Belt (grant No. 19827002D).

References (28)

  • Y. Cao et al.

    Implementation and Current Status of Food Traceability System in Jiangsu China

    Procedia Comput. Sci.

    (2017)
  • K. Hornik

    Approximation Capabilities of Multilayer Neural Network

    Neural Networks

    (1991)
  • Q. Zhang et al.

    Recent advances in convolutional neural network acceleration

    Neurocomputing

    (2019)
  • Balasudarsun, N.L., Pranavaraj, M., 2018. Application of Internet of Things in Agriculture. Int. J. Sci. Res. Publ. 8,...
  • B.S. Beheraa et al.

    Information communication technology promoting retail marketing in agriculture sector in india as a study

    Procedia Comput. Sci.

    (2015)
  • R. Collobert et al.

    Natural language processing (almost) from scratch

    J. Mach. Learn. Res.

    (2011)
  • D Randall, W., Tony R, M., 2000. Reduction Techniques for Instance-Based Learning Algorithms. Mach. Learn. 38,...
  • M. Hamada

    Secure Anonymously Authenticated and Traceable Enterprise DRM System

    Int. J. Comput. Appl.

    (2015)
  • He, K., Zhang, X., Ren, S., Sun, J., 2016a. Deep residual learning for image recognition. Proc. IEEE Comput. Soc. Conf....
  • He, K., Zhang, X., Ren, S., Sun, J., 2016b. Identity mappings in deep residual networks. Lect. Notes Comput. Sci....
  • Hou, Q., Zhou, D., Feng, J., 2021. Coordinate Attention for Efficient Mobile Network...
  • Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E., 2020. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach....
  • D.J. Jo et al.

    Extended Joint Deep Learning for Pedestrian Detection. Proc. 2nd World Congr

    Electr. Eng. Comput. Syst. Sci.

    (2016)
  • Y. Li et al.

    Automatic Lumbar Vertebrae Recognition in Intraoperative X-Ray Images Based on Hierarchical Recurrent Neural Network. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal Comput

    Des. Comput. Graph.

    (2019)
  • Cited by (4)

    • A model for recognizing farming behaviors of plantation workers

      2022, Computers and Electronics in Agriculture
      Citation Excerpt :

      In the experiment, we compare FWNet with P3D, R3D, R(2 + 1) D and EPCI-LSTM models, and results is listed in Table 4. All above baselines except EPCI-LSTM are trained the same as FWNet, and the performance of EPCI_LSTM is from the paper (Zhao et al. 2021). Compared with P3D, R3D and R(2 + 1)D, the number of parameters of FWNet is reduced by at least 11.31 M, the reasoning delay is reduced by at least 1.43 ms, the accuracy, F1 score and map are improved by at least 0.83%, 0.74% and 0.58% respectively.

    • A deep learning method for cyanobacterial harmful algae blooms prediction in Taihu Lake, China

      2022, Harmful Algae
      Citation Excerpt :

      LSTM model can better distribute the information of historical units compared to RNN and can capture long-term dependencies in time series (Solgi et al., 2021). CyanoHABs information and meteorological time series data are refined by the CNN model into the input of the LSTM model, which is more sensitive to the time-series information (Zhao et al., 2021). The LSTM part uses the stacking of two LSTM networks to model the time-series data and outputs the prediction results of CyanoHABs area through the Dense layer.

    View full text