skip to main content
research-article

ViST: A Ubiquitous Model with Multimodal Fusion for Crop Growth Prediction

Published: 07 December 2023 Publication History

Abstract

Crop growth prediction can help agricultural workers to make accurate and reasonable decisions on farming activities. Existing crop growth prediction models focus on one crop and train a single model for each crop. In this article, we develop a ubiquitous growth prediction model for multiple crops, aiming at training a single model for multiple crops. A ubiquitous vision and sensor transformer (ViST) model for crop growth prediction with image and sensor data is developed to achieve the goals. In the proposed model, a cross-attention mechanism is proposed to facilitate the fusion of multimodal feature maps to reduce computational costs and balance the interactive effects among features. To train the model, we combine the data from multiple crops to create a single (ViST) model. A sensor network system is established for data collection on the farm where rice, soybean, and maize are cultivated. Experimental results show that the proposed ViST model has an excellent ubiquitous ability for crop growth prediction with multiple crops.

References

[1]
Donald Gaydon and Christian Roth. 2014. SAC monograph: The SAARC-australia project-developing capacity in cropping systems modelling for south asia. SAARC Agriculture Centre Monograph, SAARC Agriculture Centre (SAC), BARC Campus, Farm Gate, Dhaka-1215, Bangladesh. 259 (2014)
[2]
N. Brisson, C. Gary, E. Justes, R. Roche, B. Mary, D. Ripoche, D. Zimmer, J. Sierra, P. Bertuzzi, P. Burger, F. Bussière, Y. M. Cabidoche, P. Cellier, P. Debaeke, J. P. Gaudillère, C. Hénault, F. Maraux, B. Seguin, and H. Sinoquet. 2003. An overview of the crop model stics. European Journal of Agronomy 18, 3 (2003), 309–332. DOI:
[3]
C. A. van Van Diepen, J. van Wolf, H. Van Keulen, and C. Rappoldt. 1989. WOFOST: A simulation model of crop production. Soil Use and Management 5, 1 (1989), 16-24.
[4]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[5]
Zilong Hu, Jinshan Tang, Ping Zhang, and Jingfeng Jiang. 2020. Deep learning for the identification of bruised apples by fusing 3D deep features for apple grading systems. Mechanical Systems and Signal Processing 145 (2020), 106922.
[6]
Kaili Wang, Keyu Chen, Huiyu Du, Shuang Liu, Jingwen Xu, Junfang Zhao, Houlin Chen, Yujun Liu, and Yang Liu. 2022. New image dataset and new negative sample judgment method for crop pest recognition based on deep learning models. Ecological Informatics 69 (2022), 101620. DOI:
[7]
Jibo Yue, Guijun Yang, Qingjiu Tian, Haikuan Feng, Kaijian Xu, and Chengquan Zhou. 2019. Estimate of winter-wheat above-ground biomass based on UAV ultrahigh-ground-resolution image textures and vegetation indices. ISPRS Journal of Photogrammetry and Remote Sensing 150 (2019), 226–244. DOI:
[8]
Mehmet Ozgur Turkoglu, Stefano D’Aronco, Gregor Perich, Frank Liebisch, Constantin Streit, Konrad Schindler, and Jan Dirk Wegner. 2021. Crop mapping from image time series: Deep learning with multi-scale label hierarchies. Remote Sensing of Environment 264 (2021), 112603. DOI:
[9]
Miroslav Trnka, Josef Eitzinger, Pavel Kapler, Martin Dubrovský, Daniela Semerádová, Zdeněk Žalud, and Herbert Formayer. 2007. Effect of estimated daily global solar radiation data on the results of crop growth models. Sensors 7, 10(2007), 2330–2362. DOI:
[10]
Tanhim Islam, Tanjir Alam Chisty, and Amitabha Chakrabarty. 2018. A deep neural network approach for crop selection and yield prediction in bangladesh. In Proceedings of the 2018 IEEE Region 10 Humanitarian Technology Conference. 1–6. DOI:
[11]
Omolola M. Adisa, Joel O. Botai, Abiodun M. Adeola, Abubeker Hassen, Christina M. Botai, Daniel Darkey, and Eyob Tesfamariam. 2019. Application of artificial neural network for predicting maize production in south africa. Sustainability 11, 4 (2019), 1145.
[12]
Jing Liu, C. E. Goering, and Lei Tian. 2001. A neural network for setting target corn yields. Transactions of the ASAE 44, 3 (2001), 705.
[13]
Kanichiro Matsumura, Carlos F. Gaitan, Kenji Sugimoto, Alex J. Cannon, and William W. Hsieh. 2015. Maize yield forecasting by linear regression and artificial neural networks in jilin, China. The Journal of Agricultural Science 153, 3 (2015), 399–410.
[14]
Zhiyuan Pei Bangjie Yang. 1999. The definition of crop growth and remote sensing monitoring. Transactions of the CSAE03 (1999), 214–218.
[15]
Toby N. Carlson and David A. Ripley. 1997. On the relation between NDVI, fractional vegetation cover, and leaf area index. Remote Sensing of Environment 62, 3 (1997), 241–252.
[16]
Zhe Guo, Xiang Li, Heng Huang, Ning Guo, and Quanzheng Li. 2019. Deep learning-based image segmentation on multimodal medical imaging. IEEE Transactions on Radiation and Plasma Medical Sciences 3, 2 (2019), 162–169.
[17]
Yi Xiao, Felipe Codevilla, Akhil Gurram, Onay Urfalioglu, and Antonio M. López. 2022. Multimodal end-to-end autonomous driving. IEEE Transactions on Intelligent Transportation Systems 23, 1 (2022), 537–547. DOI:
[18]
Zhongxin Chen, Huajun Tang Jianqiang Ren, Pei Leng Yun Shi, Limin Wang Jia Liu, Yanmin Yao Wenbin Wu, and Hasituya. 2016. Progress and prospect of agricultural remote sensing research and application. Journal of Remote Sensing 20, 05 (2016), 748–767.
[19]
Jun Xu, Yong Yan Cao, Sun Youxian, and Jinshan Tang. 2008. Absolute exponential stability of recurrent neural networks with generalized activation function. IEEE Transactions on Neural Networks 19, 6 (2008), 1075-1089.
[20]
Jinshan Tang, Xiaoming Liu, Huaining Cheng, and Kathleen M. Robinette. 2011. Gender recognition using 3-D human body shapes. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 41 (2011), 898–908. DOI:
[21]
Chaonan Chen, Kai Zhang, and Jinshan Tang. 2022. A COVID-19 detection algorithm using deep features and discrete social learning particle swarm optimization for edge computing device. ACM Transaction on Internet Technology 22, 3 (2022), 1-7.
[22]
Michael D. Johnson, William W. Hsieh, Alex J. Cannon, Andrew Davidson, and Frédéric Bédard. 2016. Crop yield forecasting on the canadian prairies by remotely sensed vegetation indices and machine learning methods. Agricultural and Forest Meteorology 218 (2016), 74–84.
[23]
Liheng Zhong, Lina Hu, and Hang Zhou. 2019. Deep learning based multi-temporal crop classification. Remote Sensing of Environment 221 (2019), 430–443.
[24]
Qi Yang, Liangsheng Shi, Jinye Han, Yuanyuan Zha, and Penghui Zhu. 2019. Deep convolutional neural networks for rice grain yield estimation at the ripening stage using UAV-based remotely sensed images. Field Crops Research 235 (2019), 142–153.
[25]
Ulrich Weiss and Peter Biber. 2011. Plant detection and mapping for agricultural robots using a 3D LIDAR sensor. Robotics and Autonomous Systems 59, 5 (2011), 265–273. DOI:
[26]
Huilin Tao, Haikuan Feng, Liangji Xu, Mengke Miao, Huiling Long, Jibo Yue, Zhenhai Li, Guijun Yang, Xiaodong Yang, and Lingling Fan. 2020. Estimation of crop growth parameters using UAV-based hyperspectral remote sensing data. Sensors 20, 5 (2020), 1296. DOI:
[27]
X. Zhou, H. B. Zheng, X. Q. Xu, J. Y. He, X. K. Ge, X. Yao, T. Cheng, Y. Zhu, W. X. Cao, and Y. C. Tian. 2017. Predicting grain yield in rice using multi-temporal vegetation indices from UAV-based multispectral and digital imagery. ISPRS Journal of Photogrammetry and Remote Sensing 130 (2017), 246–255. DOI:
[28]
Maitiniyazi Maimaitijiang, Abduwasit Ghulam, Paheding Sidike, Sean Hartling, Matthew Maimaitiyiming, Kyle Peterson, Ethan Shavers, Jack Fishman, Jim Peterson, Suhas Kadam, Joel Burken, and Felix Fritschi. 2017. Unmanned aerial system (UAS)-based phenotyping of soybean using multi-sensor data fusion and extreme learning machine. ISPRS Journal of Photogrammetry and Remote Sensing 134 (2017), 43–58. DOI:
[29]
Liang Wan, Haiyan Cen, Jiangpeng Zhu, Jiafei Zhang, Yueming Zhu, Dawei Sun, Xiaoyue Du, Li Zhai, Haiyong Weng, Yijian Li, Xiaoran Li, Yidan Bao, Jianyao Shou, and Yong He. 2020. Grain yield prediction of rice using multi-temporal UAV-based RGB and multispectral images and model transfer – a case study of small farmlands in the south of China. Agricultural and Forest Meteorology 291 (2020), 108096. DOI:
[30]
Jérôme G. Fortin, François Anctil, Léon-Étienne Parent, and Martin A. Bolinder. 2011. Site-specific early season potato yield forecast by neural network in eastern canada. Precision Agriculture 12, 6 (2011), 905–923.
[31]
C. A. Campbell, R. P. Zentner, and P. J. Johnson. 1988. Effect of crop rotation and fertilization on the quantitative relationship between spring wheat yield and moisture use in southwestern saskatchewan. Canadian Journal of Soil Science 68, 1 (1988), 1–16.
[32]
David L. Ehret, Bernard D. Hill, Tom Helmer, and Diane R. Edwards. 2011. Neural network modeling of greenhouse tomato yield, growth and water use from automated crop monitoring data. Computers and Electronics in Agriculture 79, 1 (2011), 82–89.
[33]
Snehal S. Dahikar and Sandeep V. Rode. 2014. Agricultural crop yield prediction using artificial neural network approach. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering 2, 1 (2014), 683–686.
[34]
Monte R. O'Neal, Bernard A. Engel, Daniel R. Ess, and Jane R. Frankenberger. 2002. AE--Automation and emerging technologies: Neural network prediction of maize yield using alternative data coding algorithms. Biosystems Engineering 83, 1 (2002), 31-45.
[35]
T Morimoto, Y. Ouchi, M. Shimizu, and M. S. Baloch. 2007. Dynamic optimization of watering satsuma mandarin using neural networks and genetic algorithms. Agricultural Water Management 93, 1–2 (2007), 1–10.
[36]
Scott Drummond, Anupam Joshi, and Kenneth A. Sudduth. 1998. Application of neural networks: Precision farming. In Proceedings of the 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98CH36227), Vol. 1. IEEE, 211–215.
[37]
N. R. Kitchen, S. T. Drummond, E. D. Lund, K. A. Sudduth, and G. W. Buchleiter. 2003. Soil electrical conductivity and topography related to yield for three contrasting soil– crop systems. Agronomy Journal 95, 3 (2003), 483–495.
[38]
Francisco M. Padilla, Marisa Gallardo, M. Teresa Peña-Fleitas, Romina De Souza, and Rodney B. Thompson. 2018. Proximal optical sensors for nitrogen management of vegetable crops: A review. Sensors 18, 7 (2018), 2083.
[39]
Saptarshi Sengupta, Sanchita Basak, Pallabi Saikia, Sayak Paul, Vasilios Tsalavoutis, Frederick Atiah, Vadlamani Ravi, and Alan Peters. 2020. A review of deep learning with special emphasis on architectures, applications and recent trends. Knowledge-Based Systems 194 (2020), 105596.
[40]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2019), 423–443. DOI:
[41]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Challenges and applications in multimodal machine learning. Association for Computing Machinery and Morgan & Claypool, 17-48.
[42]
Ben P. Yuhas, Moise H. Goldstein, and Terrence J. Sejnowski. 1989. Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine 27, 11 (1989), 65–71.
[43]
Cees G. M. Snoek and Marcel Worring. 2005. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25, 1 (2005), 5–35.
[44]
Jing Chen, Chenhui Wang, Kejun Wang, Chaoqun Yin, Cong Zhao, Tao Xu, Xinyi Zhang, Ziqiang Huang, Meichen Liu, and Tao Yang. 2021. HEU emotion: A large-scale database for multimodal emotion recognition in the wild. Neural Computing and Applications 33, 14 (2021), 8669–8685.
[45]
Zhou Lei and Yiyong Huang. 2021. Video captioning based on channel soft attention and semantic reconstructor. Future Internet 13, 2 (2021), 55.
[46]
Yu Long, Pengjie Tang, Hanli Wang, and Jian Yu. 2021. Improving reasoning with contrastive visual information for visual question answering. Electronics Letters 57, 20 (2021), 758–760.
[47]
Rafael Souza, André Fernandes, Thiago SFX Teixeira, George Teodoro, and Renato Ferreira. 2021. Online multimedia retrieval on CPU–GPU platforms with adaptive work partition. Journal of Parallel and Distributed Computing 148 (2021), 31–45.
[48]
Amir Hossein Yazdavar, Mohammad Saeid Mahdavinejad, Goonmeet Bajaj, William Romine, Amit Sheth, Amir HassanMonadjemi, Krishnaprasad Thirunarayan, John M. Meddar, Annie Myers, Jyotishman Pathak, Jyotishman. 2020. Multimodal mental health analysis in social media. Plos One 15, 4 (2020), e0226248.
[49]
Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, and Longbo Huang. 2021. What makes multi-modallearning better than single (provably). Advances in Neural Information Processing Systems 34 (2021), 10944-10956.
[50]
Chaoya Dang, Ying Liu, Hui Yue, JiaXin Qian, and Rong Zhu. 2021. Autumn crop yield prediction using data-driven approaches:-Support vector machines, random forest, and deep neural network methods. Canadian Journal of Remote Sensing 47, 2 (2021), 162–181.
[51]
Zheng Chu and Jiong Yu. 2020. An end-to-end model for rice yield prediction using deep learning fusion. Computers and Electronics in Agriculture 174 (2020), 105471. DOI:
[52]
Maitiniyazi Maimaitijiang, Vasit Sagan, Paheding Sidike, Sean Hartling, Flavio Esposito, and Felix B. Fritschi. 2020. Soybean yield prediction from UAV using multimodal data fusion and deep learning. Remote Sensing of Environment 237(2020), 111599. DOI:
[53]
Hind Taud and J. F. Mas. 2018. Multilayer perceptron (MLP). Geomatic approachesfor modeling land change scenarios (2018), 451-455.
[54]
J. G. P. W. Clevers and A. A. Gitelson. 2013. Remote estimation of crop and grass chlorophyll and nitrogen content usingred-edge bands on Sentinel-2 and -3. International Journal of Applied Earth Observation and Geoinformation 23 (2013), 344-351.
[55]
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7077-7087.
[56]
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention bottlenecks for Multimodal fusion. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.). Vol. 34. Curran Associates, Inc., 14200-14213.
[57]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, Curran Associates, Inc.
[58]
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, et al. 2022. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 87-110.
[59]
Frederick N. Fritsch and Ralph E. Carlson. 1980. Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis 17, 2 (1980), 238–246.
[60]
Rutuja R. Patil and Sumit Kumar. 2022. Rice-fusion: A multimodality data fusion framework for rice disease diagnosis. IEEE Access : Practical Innovations, Open Solutions 10 (2022), 5207–5222. DOI:
[61]
Rutuja Rajendra Patil and Sumit Kumar. 2022. Rice transformer: A novel integrated management system for controlling rice diseases. IEEE Access : Practical Innovations, Open Solutions 10 (2022), 87698–87714. DOI:
[62]
Zhengtao Li, Guokun Chen, and Tianxu Zhang. 2020. A CNN-transformer hybrid approach for crop classification using multitemporal multisensor images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13 (2020), 847–858. DOI:

Cited By

View all
  • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024

Index Terms

  1. ViST: A Ubiquitous Model with Multimodal Fusion for Crop Growth Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Sensor Networks
    ACM Transactions on Sensor Networks  Volume 20, Issue 1
    January 2024
    717 pages
    EISSN:1550-4867
    DOI:10.1145/3618078
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 07 December 2023
    Online AM: 28 October 2023
    Accepted: 05 October 2023
    Revised: 21 September 2023
    Received: 28 December 2022
    Published in TOSN Volume 20, Issue 1

    Check for updates

    Author Tags

    1. Crop growth prediction
    2. ubiquitous model
    3. multimodal learning
    4. transformer module
    5. cross-attention mechanism

    Qualifiers

    • Research-article

    Funding Sources

    • New Generation Artificial Intelligence Program
    • Heilongjiang NSF
    • Fundamental Research Funds

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)310
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media