Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

Imani, Hassan; Islam, Md Baharul

doi:10.1007/s42979-024-03184-7

Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

Original Research
Published: 15 August 2024

Volume 5, article number 799, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Hassan Imani¹ &
Md Baharul Islam^1,2

89 Accesses
Explore all metrics

Abstract

Convolutional Neural Networks (CNNs) have been receiving research attention for Stereoscopic Video Quality Assessment (SVQA) in recent years. Recently, researchers have used 3D CNNs for extracting useful spatial and temporal features from stereo videos and have used them for detecting the reduction in the quality of the stereoscopic videos. To our best knowledge, the concept of transfer learning (TL) has not been well-examined in SVQA. Pretraining and fine-tuning are approaches used in deep neural networks to transform the knowledge learned from other general fields. The previous methods that utilized TL used very heavy 3D ResNet architectures with several layers; therefore, they are very time-consuming. In this paper, we develop a new model for SVQA and use the Inflated 3-Dimensional ConvNet (I3D) network as the backbone feature extractor for our model. We first apply left and right videos to I3D models to extract their features. Then, we apply 3D CNNs to learn quality-aware features from stereo videos. We evaluate our proposed method using LFOVIAS3DPh2 and NAMA3DS1- COSPAD1 SVQA datasets. Extensive experimental studies on two datasets prove that the proposed method correlates with the subjective results. The Root-Mean-Square Error (RMSE) for the NAMA3DS1-COSPAD1 dataset is 0.2454, and the high amount of Linear Correlation Coefficient (LCC) and Spearmen Rank Order Correlation Coefficient (SROCC) values (0.895 and 0.901 respectively) for LFOVIAS3DPh2 dataset show the compatibility of the results with human visual system (HVS). Despite having lighter architecture than the best performing method, the proposed method outperforms most of the methods and overall it is the second best performing method available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stereoscopic video quality measurement with fine-tuning 3D ResNets

Article 12 August 2022

Stereoscopic Video Quality Assessment Using Modified Parallax Attention Module

Stereoscopic Video Quality Prediction Based on End-to-End Dual Stream Deep Neural Networks

Availability of data and materials

The source code for this work is available upon request to the corresponding author.

References

Al-Najdawi A, Kalawsky RS. Visual quality assessment of video and image sequences-a human-based approach. Journal of Signal Processing Systems. 2010;59(2):223–31.
Article Google Scholar
F. Torkamani-Azar, H. Imani, H. Fathollahian, Video quality measurement based on 3-d. singular value decomposition, Journal of Visual Communication and Image Representation 27 (2015) 1–6.
Yang J, Wang H, Lu W, Li B, Badii A, Meng Q. A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain. Inf Sci. 2017;414:133–46.
Article Google Scholar
Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I. Neural networks: An overview of early research, current frameworks and new challenges. Neurocomputing. 2016;214:242–68.
Article Google Scholar
Y. Chen, W. Li, C. Sakaridis, D. Dai, L. Van Gool, Domain adaptive faster r-cnn for object detection in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3339–3348.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725–1732.
Appina B, Dendi SVR, Manasa K, Channappayya SS, Bovik AC. Study of subjective quality and objective blind quality prediction of stereoscopic videos. IEEE Trans Image Process. 2019;28(10):5027–40.
Article MathSciNet Google Scholar
M. Urvoy, M. Barkowsky, R. Cousseau, Y. Koudota, V. Ricorde, P. Le Callet, J. Gutierrez, N. Garcia, Nama3ds1-cospad1: Subjective video quality assessment database on coding conditions introducing freely available high quality 3d stereoscopic sequences, in: Fourth International Workshop on Quality of Multimedia Experience, IEEE, 2012, pp. 109–114.
Y. Feng, C. Yiyu, No-reference image quality assessment through transfer learning, in: 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), IEEE, 2017, pp. 90–94.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: CVPR09, 2009.
W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al., The kinetics human action video dataset, arXiv preprint arXiv:1705.06950 (2017).
Bianco S, Celona L, Napoletano P, Schettini R. On the use of deep learning for blind image quality assessment. SIViP. 2018;12(2):355–62.
Article Google Scholar
Z. Wang, H. R. Sheikh, A. C. Bovik, et al., Objective video quality assessment, in: The handbook of video databases: design and applications, Vol. 41, Citeseer, 2003, pp. 1041–1078.
K. Hara, H. Kataoka, Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6546–6555.
Imani H, Islam MB, Junayed MS, Aydin T, Arica N. Stereoscopic video quality measurement with fine-tuning 3d resnets. Multimedia Tools and Applications. 2022;81(29):42849–69.
Article Google Scholar
P. Campisi, P. Le Callet, E. Marini, Stereoscopic images quality assessment, in: 15th European Signal Processing Conference, IEEE, 2007, pp. 2110–2114.
M. Carnec, P. Le Callet, D. Barba, An image quality assessment method based on perception of structural information, in: Proceedings 2003 International Conference on Image Processing (Cat. No. 03CH37429), Vol. 3, IEEE, 2003, pp. III–185.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12.
Article Google Scholar
A. Benoit, P. Le Callet, P. Campisi, R. Cousseau, Using disparity for quality assessment of stereoscopic images, in: 15th IEEE International Conference on Image Processing, IEEE, 2008, pp. 389–392.
J. You, L. Xing, A. Perkis, X. Wang, Perceptual quality assessment for stereoscopic images based on 2d image quality metrics and disparity analysis, in: Proc. Int. Workshop Video Process. Quality Metrics Consum. Electron, Vol. 9, 2010, pp. 1–6.
Z. Wang, E. P. Simoncelli, A. C. Bovik, Multiscale structural similarity for image quality assessment, in: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2, Ieee, 2003, pp. 1398–1402.
Sheikh HR, Bovik AC. Image information and visual quality. IEEE Trans Image Process. 2006;15(2):430–44.
Article Google Scholar
F. Lu, H. Wang, X. Ji, G. Er, Quality assessment of 3d asymmetric view coding using spatial frequency dominance model, in: 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video, IEEE, 2009, pp. 1–4.
P. Joveluro, H. Malekmohamadi, W. C. Fernando, A. Kondoz, Perceptual video quality metric for 3d video quality assessment, in: 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video, IEEE, 2010, pp. 1–4.
J. Han, T. Jiang, S. Ma, Stereoscopic video quality assessment model based on spatial-temporal structural information, in: Visual Communications and Image Processing, IEEE, 2012, pp. 1–6.
L. Jin, A. Boev, A. Gotchev, K. Egiazarian, 3d-dct based perceptual quality assessment of stereo video, in: 18th IEEE International Conference on Image Processing, IEEE, 2011, pp. 2521–2524.
Cui S, Peng Z, Chen F, Zou W, Jiang G, Yu M. Blind quality assessment for 3d synthesised video with binocular asymmetric distortion. IET Image Proc. 2020;14(6):1027–34.
Article Google Scholar
O. Messai, F. Hachouf, Z. A. Seghir, Deep learning and cyclopean view for no-reference stereoscopic image quality assessment, in: International Conference on Signal, Image, Vision and their Applications (SIVA), IEEE, 2018, pp. 1–6.
Yang J, Sim K, Gao X, Lu W, Meng Q, Li B. A blind stereoscopic image quality evaluator with segmented stacked autoencoders considering the whole visual perception route. IEEE Trans Image Process. 2018;28(3):1314–28.
Article MathSciNet Google Scholar
Yang J, Zhu Y, Ma C, Lu W, Meng Q. Stereoscopic video quality assessment based on 3d convolutional neural networks. Neurocomputing. 2018;309:83–93.
Article Google Scholar
S. Ma, S. Li, J. Xue, Y. Ding, G. Yue, Stereoscopic video quality assessment based on the two-step-training binocular fusion network, in: IEEE Visual Communications and Image Processing (VCIP), IEEE, 2019, pp. 1–4.
Imani H, Islam MB, Arica N. Three-stream 3d deep cnn for no-reference stereoscopic video quality assessment. Intelligent Systems with Applications. 2022;13: 200059.
H. Imani, S. Zaim, M. B. Islam, M. S. Junayed, Stereoscopic video quality assessment using modified parallax attention module, in: Digitizing Production Systems, Springer, 2022, pp. 39–50.
L. Wang, Y. Wang, Z. Liang, Z. Lin, J. Yang, W. An, Y. Guo, Learning parallax attention for stereo image super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12250–12259.
Xu X, Shi B, Gu Z, Deng R, Chen X, Krylov AS, Ding Y. 3d no-reference image quality assessment via transfer learning and saliency-guided feature consolidation. IEEE Access. 2019;7:85286–97.
Article Google Scholar
Otroshi-Shahreza H, Amini A, Behroozi H, No-reference image quality assessment using transfer learning, in,. 9th International Symposium on Telecommunications (IST). IEEE. 2018;2018:637–40.
F. Chollet, Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
Y. Shen, R. Fang, B. Sheng, L. Dai, H. Li, J. Qin, Q. Wu, W. Jia, Multi-task fundus image quality assessment via transfer learning and landmarks detection, in: International Workshop on Machine Learning in Medical Imaging, Springer, 2018, pp. 28–36.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
Varga D. No-reference video quality assessment based on the temporal pooling of deep features. Neural Process Lett. 2019;50(3):2595–608.
Article Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, arXiv preprint arXiv:1602.07261 (2016).
Varga D, Szirányi T. No-reference video quality assessment via pretrained cnn and lstm networks. SIViP. 2019;13(8):1569–76.
Article Google Scholar
Hou R, Zhao Y, Hu Y, Liu H. No-reference video quality evaluation by a deep transfer cnn architecture. Signal Processing: Image Communication. 2020;83: 115782.
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014).
Zhang W, Qu C, Ma L, Guan J, Huang R. Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network. Pattern Recogn. 2016;59:176–87.
Article Google Scholar
J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, in: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167 (2015).
Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of Big Data. 2019;6(1):60.
Article Google Scholar
E. Cheng, P. Burton, J. Burton, A. Joseski, I. Burnett, Rmit3dv: Pre-announcement of a creative commons uncompressed hd 3d video database, in: Fourth International Workshop on Quality of Multimedia Experience, IEEE, 2012, pp. 212–217.
Mittal A, Soundararajan R, Bovik AC. Making a completely blind image quality analyzer. IEEE Signal Process Lett. 2012;20(3):209–12.
Article Google Scholar
Pinson MH, Wolf S. A new standardized method for objectively measuring video quality. IEEE Trans Broadcast. 2004;50(3):312–22.
Article Google Scholar
Md SK, Appina B, Channappayya SS. Full-reference stereo image quality assessment using natural stereo scene statistics. IEEE Signal Process Lett. 2015;22(11):1985–9.
Article Google Scholar
Lin Y-H, Wu J-L. Quality assessment of stereoscopic 3d image compression by binocular integration behaviors. IEEE Trans Image Process. 2014;23(4):1527–42.
Article MathSciNet Google Scholar
B. Appina, A. Jalli, S. S. Battula, S. S. Channappayya, No-reference stereoscopic video quality assessment algorithm using joint motion and depth statistics, in: 25th IEEE International Conference on Image Processing (ICIP), IEEE, 2018, pp. 2800–2804.
Qi F, Zhao D, Fan X, Jiang T. Stereoscopic video quality assessment based on visual attention and just-noticeable difference models. SIViP. 2016;10(4):737–44.
Article Google Scholar
Jiang G, Liu S, Yu M, Shao F, Peng Z, Chen F. No reference stereo video quality assessment based on motion feature in tensor decomposition domain. J Vis Commun Image Represent. 2018;50:247–62.
Article Google Scholar
Chen Z, Zhou W, Li W. Blind stereoscopic video quality assessment: From depth perception to overall experience. IEEE Trans Image Process. 2017;27(2):721–34.
Article MathSciNet Google Scholar
H. Imani, M. B. Islam, L.-K. Wong, A new dataset and transformer for stereoscopic video super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 706–715.

Download references

Funding

This work is partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under the 2232 Outstanding Researchers program, Project No. 118C301. Research and its contents are solely the authors’ responsibility and do not necessarily represent the official view of the funding organizations. The funders had no role in study design, data analysis, algorithmic design, the decision to publish, or the preparation of the manuscript.

Author information

Authors and Affiliations

Faculty of Engineering and Natural Sciences, Bahcesehir University, Yildiz Ciragan Cd, Besiktas, Istanbul, 34349, Turkey
Hassan Imani & Md Baharul Islam
Department of Computing and Software Engineering, Florida Gulf Coast University, Fort Myers, FL, 33965, USA
Md Baharul Islam

Authors

Hassan Imani
View author publications
You can also search for this author in PubMed Google Scholar
Md Baharul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hassan Imani.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Ethics approval and consent

Not applicable.

Consent for publication

All authors read and approved the final manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Imani, H., Islam, M.B. Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features. SN COMPUT. SCI. 5, 799 (2024). https://doi.org/10.1007/s42979-024-03184-7

Download citation

Received: 18 December 2023
Accepted: 28 July 2024
Published: 15 August 2024
DOI: https://doi.org/10.1007/s42979-024-03184-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stereoscopic video quality measurement with fine-tuning 3D ResNets

Stereoscopic Video Quality Assessment Using Modified Parallax Attention Module

Stereoscopic Video Quality Prediction Based on End-to-End Dual Stream Deep Neural Networks

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stereoscopic video quality measurement with fine-tuning 3D ResNets

Stereoscopic Video Quality Assessment Using Modified Parallax Attention Module

Stereoscopic Video Quality Prediction Based on End-to-End Dual Stream Deep Neural Networks

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation