Skip to main content
Log in

3D plant root system reconstruction based on fusion of deep structure-from-motion and IMU

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Roots play a critical role in the functioning of plants. However, it is still challenging to generate detailed 3D models of thin and complicated plant roots, due to the complexity of the structure and the limited textures. Limited by the difficulty of realization and inaccessibility of labeled data for training, few works have been put in exploring this problem using deep neural networks. To overcome this limitation, this paper presents a structure-from-motion based deep neural network structure for plant root reconstruction in a self-supervised manner, which can be applied by mobile phone platforms. In the training process of deep structure-from-motion, each depth is constrained from the depth map and predicted relative poses from their adjacent frames captured by the mobile phone cameras, and the LSTM-based network after CNN for pose estimation is learnt from the ego-motion constraints by further exploiting the temporal relationship between consecutive frames. IMU unit in the mobile phone is further utilized to improve the pose estimation network by continuously updating the correct scales from the gyroscope and accelerometer moment. Our proposed approach is able to solve the scale ambiguity in recovering the absolute scale of the real plant roots so that the approach can promote the performance of camera pose estimation and scene reconstruction jointly. The experimental results on both real plant root dataset and the rendered synthetic root dataset demonstrate the superior performance of our method compared with the classical and state-of-the-art learning-based structure-from-motion methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building rome in a day. Commun ACM 54(10):105–112

    Article  Google Scholar 

  2. Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building rome in a day. In: IEEE international conference on computer vision, pp 72–79

  3. Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: 2019 International conference on robotics and automation (ICRA), pp 5474–5480. IEEE

  4. Beardsley P, Torr P, Zisserman A (1996) 3d model acquisition from extended image sequences. In: European conference on computer vision, pp 683–695. Springer

  5. Bian JW, Li Z, Wang N, Zhan H, Shen C, Cheng MM, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. arXiv:1908.10553

  6. Bloesch M, Burri M, Omari S, Hutter M, Siegwart R (2017) Iterated extended kalman filter based visual-inertial odometry using direct photometric feedback. The Int J of Rob Res 36(10):1053–1072

    Article  Google Scholar 

  7. Bloesch M, Omari S, Hutter M, Siegwart R (2015) Robust visual inertial odometry using a direct ekf-based approach. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 298–304. IEEE

  8. Chen J, Ngo CW (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM international conference on multimedia, pp 32–41

  9. Chen JJ, Ngo CW, Chua TS (2017) Cross-modal recipe retrieval with rich food attributes. In: Proceedings of the 25th ACM international conference on multimedia, pp 1771–1779

  10. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp 628–644. Springer

  11. Cui H, Gao X, Shen S, Hu Z (2017) Hsfm: hybrid structure-from-motion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1212–1221

  12. Dellaert F, Seitz SM, Thorpe CE, Thrun S (2000) Structure from motion without correspondence. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662), vol 2, pp 557–564. IEEE

  13. Fan H, Su H, Guibas LJ (2017) A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 605– 613

  14. Farenzena M, Fusiello A, Gherardi R (2009) Structure-and-motion pipeline on a hierarchical cluster tree. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 1489–1496. IEEE

  15. Faugeras OD, Luong QT, Maybank SJ (1992) Camera self-calibration: theory and experiments. In: European conference on computer vision, pp 321–334. Springer

  16. Feng T, Gu D (2019) Sganvo: unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks. IEEE Rob Autom Lett 4(4):4431–4437

    Article  Google Scholar 

  17. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  18. Frahm JM, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C, Jen YH, Dunn E, Clipp B, Lazebnik S et al (2010) Building rome on a cloudless day. In: European conference on computer vision, pp 368–381. Springer

  19. Garg R, BG VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometry to the rescue. In: ECCV, pp 740–756

  20. Gherardi R, Farenzena M, Fusiello A (2010) Improving the efficiency of hierarchical structure-and-motion. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 1594–1600. IEEE

  21. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR

  22. Godard C, Mac Aodha O, Firman M, Brostow G (2019) Digging into self-supervised monocular depth estimation. ICCV

  23. Haas JK (2014) A history of the unity game engine

  24. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press

  25. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034

  26. Huang W, Liu H, Wan W (2020) Online initialization and extrinsic spatial-temporal calibration for monocular visual-inertial odometry. arXiv:2004.05534

  27. Jiang N, Cui Z, Tan P (2013) A global linear method for camera pose registration. In: Proceedings of the IEEE international conference on computer vision, pp 481–488

  28. Jones ES, Soatto S (2011) Visual-inertial navigation, mapping and localization: a scalable real-time causal approach. The Int J Rob Res 30(4):407–430

    Article  Google Scholar 

  29. Khan M, Gemenet DC, Villordon A (2016) Root system architecture and abiotic stress tolerance: current knowledge in root and tuber crops. Front Plant Sci 7:1584

    Google Scholar 

  30. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  31. Leutenegger S, Furgale P, Rabaud V, Chli M, Konolige K, Siegwart R (2013) Keyframe-based visual-inertial slam using nonlinear optimization. Proceedings of robotis science and systems (RSS) 2013

  32. Leutenegger S, Lynen S, Bosse M, Siegwart R, Furgale P (2015) Keyframe-based visual–inertial odometry using nonlinear optimization. The Int J Rob Res 34(3):314–334

    Article  Google Scholar 

  33. Li K, Ma J, Li H, Han Y, Yue X, Chen Z, Yang J (2019) Discern depth under foul weather: estimate pm2.5 for depth inference. IEEE Trans Industr Inform

  34. Li M, Mourikis AI (2013) High-precision, consistent ekf-based visual-inertial odometry. The Int J Rob Res 32(6):690–711

    Article  Google Scholar 

  35. Li X, Hou Y, Wu Q, Wang P, Li W (2019) Dvonet: unsupervised monocular depth estimation and visual odometry. In: 2019 IEEE visual communications and image processing (VCIP), pp 1–4. IEEE

  36. Ma J, Li K, Han Y, Du P, Yang J (2018) Image-based pm2. 5 estimation and its application on depth estimation. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1857–1861. IEEE

  37. Moulon P, Monasse P, Marlet R (2012) Adaptive structure from motion with a contrario model estimation. In: Asian conference on computer vision, pp 257–270. Springer

  38. Moulon P, Monasse P, Marlet R (2013) Global fusion of relative motions for robust, accurate and scalable structure from motion. In: Proceedings of the IEEE international conference on computer vision, pp 3248–3255

  39. Mourikis AI, Roumeliotis SI (2007) A multi-state constraint kalman filter for vision-aided inertial navigation. In: Proceedings 2007 IEEE international conference on robotics and automation, pp 3565–3572. IEEE

  40. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  41. Nath Kundu J, Krishna Uppala P, Pahuja A, Venkatesh Babu R (2018) Adadepth: unsupervised content congruent adaptation for depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2656–2665

  42. Newcombe RA, Lovegrove SJ, Davison AJ (2011) Dtam: dense tracking and mapping in real-time. In: 2011 International conference on computer vision, pp 2320–2327. IEEE

  43. Poggi M, Tosi F, Mattoccia S (2018) Learning monocular depth estimation with unsupervised trinocular assumptions. In: 2018 International conference on 3d vision (3DV), pp 324–333. IEEE

  44. Pollefeys M, Koch R, Van Gool L (1999) Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters. Int J Comput Vis 32(1):7–25

    Article  Google Scholar 

  45. Pollefeys M, Nistér D, Frahm JM, Akbarzadeh A, Mordohai P, Clipp B, Engels C, Gallup D, Kim SJ, Merrell P et al (2008) Detailed real-time urban 3d reconstruction from video. Int J Comput Vis 78(2-3):143–167

    Article  Google Scholar 

  46. Qin T, Li P, Shen S (2018) Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans Robot 34(4):1004–1020

    Article  Google Scholar 

  47. Rogers ED, Benfey PN (2015) Regulation of plant root system architecture: implications for crop advancement. Curr Opin Biotechnol 32:93–98

    Article  Google Scholar 

  48. Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113

  49. Snavely N (2011) Scene reconstruction and visualization from internet photo collections: a survey. IPSJ Trans Comput Vis Appl 3:44–66

    Article  Google Scholar 

  50. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. In: ACM Siggraph 2006 papers, pp 835–846

  51. Sweeney C, Sattler T, Hollerer T, Turk M, Pollefeys M (2015) Optimizing the viewing graph for structure-from-motion. In: Proceedings of the IEEE international conference on computer vision, pp 801–809

  52. Tanskanen P, Naegeli T, Pollefeys M, Hilliges O (2015) Semi-direct ekf-based monocular visual-inertial odometry. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 6073–6078. IEEE

  53. Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. 2005. Massachusetts Institute of Technology, USA

  54. Triggs B, McLauchlan PF, Hartley RI, Fitzgibbon AW (1999) Bundle adjustment—a modern synthesis. In: International workshop on vision algorithms, pp 298–372. Springer

  55. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5038–5047

  56. Wang D, Pan Q, Zhao C, Hu J, Liu L, Tian L (2016) Slam-based cooperative calibration for optical sensors array with gps/imu aided. In: 2016 International conference on unmanned aircraft systems (ICUAS), pp 615–623. IEEE

  57. Wang K, Shen S (2018) Mvdepthnet: real-time multiview depth estimation neural network. In: 2018 International conference on 3d vision (3DV), pp 248–257. IEEE

  58. Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang YG (2018) Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp 52–67

  59. Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International conference on robotics and automation (ICRA), pp 2043–2050. IEEE

  60. Weiss SM (2012) Vision based navigation for micro helicopters. Ph.D. thesis, ETH Zurich

  61. Wilson K, Snavely N (2014) Robust global translations with 1dsfm. In: European conference on computer vision, pp 61–75. Springer

  62. Wu A, Han Y (2018) Multi-modal circulant fusion for video-to-language and backward. In: IJCAI, vol 3, p 8

  63. Wu C (2013) Towards linear-time incremental structure from motion. In: 2013 International conference on 3d vision-3DV 2013, pp 127–134. IEEE

  64. Wu C, Agarwal S, Curless B, Seitz SM (2011) Multicore bundle adjustment. In: CVPR, pp 3057–3064. IEEE

  65. Wu C et al (2011) Visualsfm: a visual structure from motion system

  66. Yang B, Wen H, Wang S, Clark R, Markham A, Trigoni N (2017) 3d object reconstruction from a single depth view with adversarial learning. In: Proceedings of the IEEE international conference on computer vision, pp 679–688

  67. Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1983–1992

  68. You Y, Wang Y, Chao WL, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. arXiv:1906.06310

  69. Zebedin L, Bauer J, Karner K, Bischof H (2008) Fusion of feature-and area-based information for urban buildings modeling from aerial imagery. In: European conference on computer vision, pp 873–886. Springer

  70. Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR

  71. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858

  72. Zhu S, Shen T, Zhou L, Zhang R, Wang J, Fang T, Quan L (2017) Parallel structure from motion from local increment to global averaging. arXiv:1702.08601

  73. Zhu S, Zhang R, Zhou L, Shen T, Fang T, Tan P, Quan L (2018) Very large-scale global sfm by distributed motion averaging. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4568–4577

  74. Zou Y, Luo Z, Huang JB (2018) Df-net: unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European conference on computer vision (ECCV), pp 36–53

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoyu Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Y., Wang, Y., Chen, Z. et al. 3D plant root system reconstruction based on fusion of deep structure-from-motion and IMU. Multimed Tools Appl 80, 17315–17331 (2021). https://doi.org/10.1007/s11042-020-10069-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10069-3

Keywords

Navigation