Abstract
Convolutional Neural Networks (CNNs) have been receiving research attention for Stereoscopic Video Quality Assessment (SVQA) in recent years. Recently, researchers have used 3D CNNs for extracting useful spatial and temporal features from stereo videos and have used them for detecting the reduction in the quality of the stereoscopic videos. To our best knowledge, the concept of transfer learning (TL) has not been well-examined in SVQA. Pretraining and fine-tuning are approaches used in deep neural networks to transform the knowledge learned from other general fields. The previous methods that utilized TL used very heavy 3D ResNet architectures with several layers; therefore, they are very time-consuming. In this paper, we develop a new model for SVQA and use the Inflated 3-Dimensional ConvNet (I3D) network as the backbone feature extractor for our model. We first apply left and right videos to I3D models to extract their features. Then, we apply 3D CNNs to learn quality-aware features from stereo videos. We evaluate our proposed method using LFOVIAS3DPh2 and NAMA3DS1- COSPAD1 SVQA datasets. Extensive experimental studies on two datasets prove that the proposed method correlates with the subjective results. The Root-Mean-Square Error (RMSE) for the NAMA3DS1-COSPAD1 dataset is 0.2454, and the high amount of Linear Correlation Coefficient (LCC) and Spearmen Rank Order Correlation Coefficient (SROCC) values (0.895 and 0.901 respectively) for LFOVIAS3DPh2 dataset show the compatibility of the results with human visual system (HVS). Despite having lighter architecture than the best performing method, the proposed method outperforms most of the methods and overall it is the second best performing method available.







Similar content being viewed by others
Availability of data and materials
The source code for this work is available upon request to the corresponding author.
References
Al-Najdawi A, Kalawsky RS. Visual quality assessment of video and image sequences-a human-based approach. Journal of Signal Processing Systems. 2010;59(2):223–31.
F. Torkamani-Azar, H. Imani, H. Fathollahian, Video quality measurement based on 3-d. singular value decomposition, Journal of Visual Communication and Image Representation 27 (2015) 1–6.
Yang J, Wang H, Lu W, Li B, Badii A, Meng Q. A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain. Inf Sci. 2017;414:133–46.
Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I. Neural networks: An overview of early research, current frameworks and new challenges. Neurocomputing. 2016;214:242–68.
Y. Chen, W. Li, C. Sakaridis, D. Dai, L. Van Gool, Domain adaptive faster r-cnn for object detection in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3339–3348.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725–1732.
Appina B, Dendi SVR, Manasa K, Channappayya SS, Bovik AC. Study of subjective quality and objective blind quality prediction of stereoscopic videos. IEEE Trans Image Process. 2019;28(10):5027–40.
M. Urvoy, M. Barkowsky, R. Cousseau, Y. Koudota, V. Ricorde, P. Le Callet, J. Gutierrez, N. Garcia, Nama3ds1-cospad1: Subjective video quality assessment database on coding conditions introducing freely available high quality 3d stereoscopic sequences, in: Fourth International Workshop on Quality of Multimedia Experience, IEEE, 2012, pp. 109–114.
Y. Feng, C. Yiyu, No-reference image quality assessment through transfer learning, in: 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), IEEE, 2017, pp. 90–94.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: CVPR09, 2009.
W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al., The kinetics human action video dataset, arXiv preprint arXiv:1705.06950 (2017).
Bianco S, Celona L, Napoletano P, Schettini R. On the use of deep learning for blind image quality assessment. SIViP. 2018;12(2):355–62.
Z. Wang, H. R. Sheikh, A. C. Bovik, et al., Objective video quality assessment, in: The handbook of video databases: design and applications, Vol. 41, Citeseer, 2003, pp. 1041–1078.
K. Hara, H. Kataoka, Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6546–6555.
Imani H, Islam MB, Junayed MS, Aydin T, Arica N. Stereoscopic video quality measurement with fine-tuning 3d resnets. Multimedia Tools and Applications. 2022;81(29):42849–69.
P. Campisi, P. Le Callet, E. Marini, Stereoscopic images quality assessment, in: 15th European Signal Processing Conference, IEEE, 2007, pp. 2110–2114.
M. Carnec, P. Le Callet, D. Barba, An image quality assessment method based on perception of structural information, in: Proceedings 2003 International Conference on Image Processing (Cat. No. 03CH37429), Vol. 3, IEEE, 2003, pp. III–185.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12.
A. Benoit, P. Le Callet, P. Campisi, R. Cousseau, Using disparity for quality assessment of stereoscopic images, in: 15th IEEE International Conference on Image Processing, IEEE, 2008, pp. 389–392.
J. You, L. Xing, A. Perkis, X. Wang, Perceptual quality assessment for stereoscopic images based on 2d image quality metrics and disparity analysis, in: Proc. Int. Workshop Video Process. Quality Metrics Consum. Electron, Vol. 9, 2010, pp. 1–6.
Z. Wang, E. P. Simoncelli, A. C. Bovik, Multiscale structural similarity for image quality assessment, in: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2, Ieee, 2003, pp. 1398–1402.
Sheikh HR, Bovik AC. Image information and visual quality. IEEE Trans Image Process. 2006;15(2):430–44.
F. Lu, H. Wang, X. Ji, G. Er, Quality assessment of 3d asymmetric view coding using spatial frequency dominance model, in: 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video, IEEE, 2009, pp. 1–4.
P. Joveluro, H. Malekmohamadi, W. C. Fernando, A. Kondoz, Perceptual video quality metric for 3d video quality assessment, in: 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video, IEEE, 2010, pp. 1–4.
J. Han, T. Jiang, S. Ma, Stereoscopic video quality assessment model based on spatial-temporal structural information, in: Visual Communications and Image Processing, IEEE, 2012, pp. 1–6.
L. Jin, A. Boev, A. Gotchev, K. Egiazarian, 3d-dct based perceptual quality assessment of stereo video, in: 18th IEEE International Conference on Image Processing, IEEE, 2011, pp. 2521–2524.
Cui S, Peng Z, Chen F, Zou W, Jiang G, Yu M. Blind quality assessment for 3d synthesised video with binocular asymmetric distortion. IET Image Proc. 2020;14(6):1027–34.
O. Messai, F. Hachouf, Z. A. Seghir, Deep learning and cyclopean view for no-reference stereoscopic image quality assessment, in: International Conference on Signal, Image, Vision and their Applications (SIVA), IEEE, 2018, pp. 1–6.
Yang J, Sim K, Gao X, Lu W, Meng Q, Li B. A blind stereoscopic image quality evaluator with segmented stacked autoencoders considering the whole visual perception route. IEEE Trans Image Process. 2018;28(3):1314–28.
Yang J, Zhu Y, Ma C, Lu W, Meng Q. Stereoscopic video quality assessment based on 3d convolutional neural networks. Neurocomputing. 2018;309:83–93.
S. Ma, S. Li, J. Xue, Y. Ding, G. Yue, Stereoscopic video quality assessment based on the two-step-training binocular fusion network, in: IEEE Visual Communications and Image Processing (VCIP), IEEE, 2019, pp. 1–4.
Imani H, Islam MB, Arica N. Three-stream 3d deep cnn for no-reference stereoscopic video quality assessment. Intelligent Systems with Applications. 2022;13: 200059.
H. Imani, S. Zaim, M. B. Islam, M. S. Junayed, Stereoscopic video quality assessment using modified parallax attention module, in: Digitizing Production Systems, Springer, 2022, pp. 39–50.
L. Wang, Y. Wang, Z. Liang, Z. Lin, J. Yang, W. An, Y. Guo, Learning parallax attention for stereo image super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12250–12259.
Xu X, Shi B, Gu Z, Deng R, Chen X, Krylov AS, Ding Y. 3d no-reference image quality assessment via transfer learning and saliency-guided feature consolidation. IEEE Access. 2019;7:85286–97.
Otroshi-Shahreza H, Amini A, Behroozi H, No-reference image quality assessment using transfer learning, in,. 9th International Symposium on Telecommunications (IST). IEEE. 2018;2018:637–40.
F. Chollet, Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
Y. Shen, R. Fang, B. Sheng, L. Dai, H. Li, J. Qin, Q. Wu, W. Jia, Multi-task fundus image quality assessment via transfer learning and landmarks detection, in: International Workshop on Machine Learning in Medical Imaging, Springer, 2018, pp. 28–36.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
Varga D. No-reference video quality assessment based on the temporal pooling of deep features. Neural Process Lett. 2019;50(3):2595–608.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, arXiv preprint arXiv:1602.07261 (2016).
Varga D, Szirányi T. No-reference video quality assessment via pretrained cnn and lstm networks. SIViP. 2019;13(8):1569–76.
Hou R, Zhao Y, Hu Y, Liu H. No-reference video quality evaluation by a deep transfer cnn architecture. Signal Processing: Image Communication. 2020;83: 115782.
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014).
Zhang W, Qu C, Ma L, Guan J, Huang R. Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network. Pattern Recogn. 2016;59:176–87.
J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, in: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167 (2015).
Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of Big Data. 2019;6(1):60.
E. Cheng, P. Burton, J. Burton, A. Joseski, I. Burnett, Rmit3dv: Pre-announcement of a creative commons uncompressed hd 3d video database, in: Fourth International Workshop on Quality of Multimedia Experience, IEEE, 2012, pp. 212–217.
Mittal A, Soundararajan R, Bovik AC. Making a completely blind image quality analyzer. IEEE Signal Process Lett. 2012;20(3):209–12.
Pinson MH, Wolf S. A new standardized method for objectively measuring video quality. IEEE Trans Broadcast. 2004;50(3):312–22.
Md SK, Appina B, Channappayya SS. Full-reference stereo image quality assessment using natural stereo scene statistics. IEEE Signal Process Lett. 2015;22(11):1985–9.
Lin Y-H, Wu J-L. Quality assessment of stereoscopic 3d image compression by binocular integration behaviors. IEEE Trans Image Process. 2014;23(4):1527–42.
B. Appina, A. Jalli, S. S. Battula, S. S. Channappayya, No-reference stereoscopic video quality assessment algorithm using joint motion and depth statistics, in: 25th IEEE International Conference on Image Processing (ICIP), IEEE, 2018, pp. 2800–2804.
Qi F, Zhao D, Fan X, Jiang T. Stereoscopic video quality assessment based on visual attention and just-noticeable difference models. SIViP. 2016;10(4):737–44.
Jiang G, Liu S, Yu M, Shao F, Peng Z, Chen F. No reference stereo video quality assessment based on motion feature in tensor decomposition domain. J Vis Commun Image Represent. 2018;50:247–62.
Chen Z, Zhou W, Li W. Blind stereoscopic video quality assessment: From depth perception to overall experience. IEEE Trans Image Process. 2017;27(2):721–34.
H. Imani, M. B. Islam, L.-K. Wong, A new dataset and transformer for stereoscopic video super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 706–715.
Funding
This work is partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under the 2232 Outstanding Researchers program, Project No. 118C301. Research and its contents are solely the authors’ responsibility and do not necessarily represent the official view of the funding organizations. The funders had no role in study design, data analysis, algorithmic design, the decision to publish, or the preparation of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Ethics approval and consent
Not applicable.
Consent for publication
All authors read and approved the final manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Imani, H., Islam, M.B. Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features. SN COMPUT. SCI. 5, 799 (2024). https://doi.org/10.1007/s42979-024-03184-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-03184-7