Abstract
The volume estimation of a rigid object from a single view object image is the important need in numerous automated vision based systems. The volume estimation on multiple view images are simple to estimate. But volume estimation on a single view object image is a difficult process and has significant importance in volume estimation. This work presents effective object volume estimation in both regular and irregular single view object images. Initially, the single view input images are pre-processed with Mean-median filtering. Afterwards, edge features are extracted by utilizing the Gaussian edge based laplacian operator and key points are extracted using the Scale invariant feature transform (SIFT) feature. The extracted features are considered for the shape analysis of the objects. Subsequently, VGG-ResNet framework is utilized for depth analysis based on the extracted features. The point clouds generation for the volume estimation is attained through the extracted features. Finally, the volume estimation on single view object is effectively attained through the hybrid 3 dimensional U-Net and graph neural network (Hybrid 3DU-GNet). This framework provides the 3D geometric creation for the accurate volume estimation. This provides the significant improvement on volume estimation. The presented methodology effectively estimates the volume on both regular and irregular single view object images. The presented approach is implemented in the working platform of MATLAB. The experimental results of the presented work is analysed with the different existing approaches and proved the significant improvement in performance metrics. The performance metrics are Accuracy (98.59%), precision (98.21%), recall (97.09%), computational time (3.2 seconds), R-squared (98.2%), (Mean absolute percentage error) MAPE (6.1%), and (Root mean squared error) RMSE (0.93).
Similar content being viewed by others
Data availability
Data sharing not applicable to this article.
References
Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2019) Multi-view clustering via canonical correlation analysis, proceedings of the 26th annual international conference on machine learning, 129-136
Chen P-H, Yang H-C, Chen K-W, Chen Y-S (2020) MVSNet++: learning depth-based attention pyramid features for multi-view stereo. IEEE Trans Image Process 29:7261–7273
Dehais J, Anthimopoulos M, Shevchik S, Mougiakakou S (2016) Two-view 3D reconstruction for food volume estimation. IEEE Trans Multimed 19(5):1090–1099
dos Santos Rosa N, Guizilini V, Grassi V (2019) Sparse-to-continuous: enhancing monocular depth estimation using occupancy maps. In 2019 19th international conference on advanced robotics (ICAR), IEEE, 793-800
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition 2002-2011
Godard C, Aodha OM, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In proceedings of the IEEE/CVF international conference on computer vision 3828-3838
Goldman M, Hassner T, Avidan S (2019) Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 1–10
Guizilini V, Ambrus R, Pillai S, Gaidon (2019) A. Packnet-sfm: 3d packing for self-supervised monocular depth estimation. arXiv preprint arXiv:1905.02693
Guo Y, Ding X, Liu C, Xue JH (2016) Sufficient canonical correlation analysis. IEEE Trans Image Process 6(25):610–2619
He T, Collomosse J, Jin H, Soatto S (2020) Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. Adv Neural Inf Process Syst 33:9276–9287
He L, Lu J, Wang G, Song S, Zhou J (2021) SOSD-net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440:251–263
Hou T, Ahmadyan A, Zhang L, Wei J, Grundmann M (2020) Mobilepose: real-time pose estimation for unseen objects with weak shape supervision. arXiv preprint arXiv:2003.03522
Huang P-H, Matzen K, Kopf J, Ahuja N, Huang J-B (2018) Deepmvs: learning multi-view stereopsis. In proceedings of the IEEE conference on computer vision and pattern recognition, 2821-2830
Huynh L, Nguyen-Ha P, Matas J, Rahtu E, Heikkilä J (2020) Guiding monocular depth estimation using depth-attention volume. In European Conference on Computer Vision, Springer, Cham, pp 581–597
Jadhav T, Singh K, Abhyankar A (2019) Volumetric estimation using 3D reconstruction method for grading of fruits. Multimed Tools Appl 78(2):1613–1634
Khan F, Salahuddin S, Javidnia H (2020) Deep learning-based monocular depth estimation methods—a state-of-the-art review. Sensors 20(8):2272
Kharazi BA, Behzadan AH (2021) Flood depth mapping in street photos with image processing and deep neural networks. Comput Environ Urban Syst 88:1–12
Khojastehnazhand M, Mohammadi V, Minaei S (2019) Maturity detection and volume estimation of apricot using image processing technique. ScientiaHorticulturae 251:247–251
Kirk R, Mangan M and Cielniak G (2021) Non-destructive soft fruit mass and volume estimation for phenotyping in horticulture. In international conference on computer vision systems, springer, Cham 223-233.
Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326, pp 1–11
Liang B, Zheng L (2017) Specificity and Latent Correlation Learning for Action Recognition Using Synthetic Multi-View Data From Depth Maps, IEEE Transactions On Image Processing, 26(12)
Liao J, Fu Y, Yan Q, Luo F, Xiao C (2021) Adaptive depth estimation for pyramid multi-view stereo. Comput Graph 97:268–278
Liu J, Wang X, Wang T (2019) Classification of tree species and stock volume estimation in ground forest images using deep learning. Comput Electron Agric 166:105012
Luo K, Guan T, Ju L, Huang H, Luo Y (2019) P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10452–10461
Maugey T, Petrazzuoli G, Frossard P, Cagnazzo M, Pesquet-Popescu B (2016) Reference view selection in DIBR-based multiview coding. IEEE Trans Image Process 25(4):1808–1819
Mon TO, ZarAung N (2020) Vision based volume estimation method for automatic mango grading system. Biosyst Eng 198:338–349
Okinda C, Sun Y, Nyalala I, Korohou T, Opiyo S, Wang J, Shen M (2020) Egg volume estimation based on image processing and computer vision. J Food Eng 283:110041
Pandey S (2020) A comparative study of 2D-to-3D reconstruction techniques. In Intelligent Communication, Control and Devices, Springer, Singapore 255–263
Rematas K, Nguyen CH, Ritschel T, Fritz M, Tuytelaars T (Aug. 2016) Novel views of objects from a single image. IEEE Trans Pattern Anal Mach Intell 39(8):1576–1590
Su Z, Zhou T, Li K, Brady D, Liu Y (2020) View synthesis from multi-view RGB data using multilayered representation and volumetric estimation. Virtual Real Intell Hardw 2(1):43–55
Sun P, Wu S, Lin K (2020) Attention-guided multi-view stereo network for depth estimation. In 2020 IEEE 22nd international conference on high performance computing and communications; IEEE 18th international conference on Smart City; IEEE 6th international conference on data science and systems (HPCC/SmartCity/DSS), 808-815
Tiwari A (2019) Nondestructive methods for size determination of fruits and vegetables. In Processing of Fruits and Vegetables, Apple Academic Press 203–221
Tosi F, Aleotti F, Poggi M, Mattoccia S (2019) Learning monocular depth estimation infusing traditional stereo knowledge. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition 9799-9809
Wu X, Wang H, Liu C, Jia Y (June 2015) Cross-view action recognition over heterogeneous feature spaces, proceedings of the IEEE international conference on computer vision, 609-616
Xie H, Yao H, Sun X, Zhou S, Zhang S (2019) Pix2vox: context-aware 3d reconstruction from single and multi-view images. In proceedings of the IEEE/CVF international conference on computer vision, 2690-2698
Xie H, Yao H, Zhang S, Zhou S, Sun W (2020) Pix2Vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int J Comput Vis 128(12):2919–2935
Xu Q, Wang W, Ceylan D, Mech R, Neumann U (2019) Disn: deep implicit surface network for high-quality single-view 3d reconstruction. Adv Neural Inf Proces Syst 32:1–11
Yang H-C, Chen P-H, Chen K-W, Lee C-Y, Chen Y-S (2020) Fade: feature aggregation for depth estimation with multi-view stereo. IEEE Trans Image Process 29:6590–6600
Yang Z, Yu H, Cao S, Xu Q, Yuan D, Zhang H, Sun M (2021) Human-mimetic estimation of food volume from a single-view RGB image using an AI system. Electronics 10(13):1556
Yu A, Guo W, Liu B, Chen X, Wang X, Cao X, Jiang B (2021) Attention aware cost volume pyramid based multi-view stereo network for 3D reconstruction. ISPRS J Photogramm Remote Sens 175:448–460
Zanfir A, Marinoiu E, Sminchisescu C(2018) Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In proceedings of the IEEE conference on computer vision and pattern recognition, 2148-2157
Zhao S, Fu H, Gong M, Tao D (2019) Geometry-aware symmetric domain adaptation for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9788–9798
Funding
No funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declared that there is no conflict of Interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dalai, R., Dalai, N. & Senapati, K.K. An accurate volume estimation on single view object images by deep learning based depth map analysis and 3D reconstruction. Multimed Tools Appl 82, 28235–28258 (2023). https://doi.org/10.1007/s11042-023-14615-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14615-7