Stereoscopic video quality measurement with fine-tuning 3D ResNets

Imani, Hassan; Islam, Md Baharul; Junayed, Masum Shah; Aydin, Tarkan; Arica, Nafiz

doi:10.1007/s11042-022-13485-9

Stereoscopic video quality measurement with fine-tuning 3D ResNets

1221: Deep Learning for Image/Video Compression and Visual Quality Assessment
Published: 12 August 2022

Volume 81, pages 42849–42869, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hassan Imani ORCID: orcid.org/0000-0003-1566-3897¹,
Md Baharul Islam^1,2,
Masum Shah Junayed¹,
Tarkan Aydin¹ &
…
Nafiz Arica¹

272 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, Convolutional Neural Networks with 3D kernels (3D CNNs) have shown great superiority over 2D CNNs for video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are utilized to extract the spatio-temporal features from the stereoscopic video. Besides, the emergence of substantial video datasets such as Kinetics has made it possible to use pre-trained 3D CNNs in other video-related fields. In this paper, we fine-tune 3D Residual Networks (3D ResNets) pre-trained on the Kinetics dataset for measuring the quality of stereoscopic videos and propose a no-reference SVQA method. Specifically, our aim is twofold: Firstly, we answer the question: can we use 3D CNNs as a quality-aware feature extractor from stereoscopic videos or not. Secondly, we explore which ResNet architecture is more appropriate for SVQA. Experimental results on two publicly available SVQA datasets of LFOVIAS3DPh2 and NAMA3DS1-COSPAD1 show the effectiveness of the proposed transfer learning-based method for SVQA that provides the RMSE of 0.332 in LFOVIAS3DPh2 dataset. Also, the results show that deeper 3D ResNet models extract more efficient quality-aware features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

Article 15 August 2024

Stereoscopic Video Quality Assessment Using Modified Parallax Attention Module

Stereoscopic Video Quality Prediction Based on End-to-End Dual Stream Deep Neural Networks

References

Appina B, Jalli A, Battula SS, Channappayya SS (2018) No-reference stereoscopic video quality assessment algorithm using joint motion and depth statistics. In: 25th IEEE international conference on image processing (ICIP), IEEE, pp 2800–2804
Appina B, Dendi SVR, Manasa K, Channappayya SS, Bovik AC (2019) Study of subjective quality and objective blind quality prediction of stereoscopic videos. IEEE Trans Image Process 28(10):5027–5040
Article MathSciNet MATH Google Scholar
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2016) An efficient human visual system based quality metric for 3d video. Multimed Tools Appl 75(8):4187–4215
Article Google Scholar
Banitalebi-Dehkordi A, Nasiopoulos P (2018) Saliency inspired quality assessment of stereoscopic 3d video. Multimed Tools Appl 77(19):26055–26082
Article Google Scholar
Benoit A, Le Callet P, Campisi P, Cousseau R (2008) Using disparity for quality assessment of stereoscopic images. In: 15th IEEE international conference on image processing, IEEE, pp 389–392
Bianco S, Celona L, Napoletano P, Schettini R (2018) On the use of deep learning for blind image quality assessment. SIViP 12(2):355–362
Article Google Scholar
Campisi P, Le Callet P, Marini E (2007) Stereoscopic images quality assessment. In: 15th European signal processing conference, IEEE, pp 2110–2114
Chen L, Zhao J (2019) Perceptual quality assessment of stereoscopic images based on local and global visual characteristics. Multimed Tools Appl 78 (9):12139–12156
Article Google Scholar
Chen K, Franko K, Sang R (2021) Structured model pruning of convolutional networks on tensor processing units. arXiv:2107.04191
Chen Z, Zhou W, Li W (2017) Blind stereoscopic video quality assessment: from depth perception to overall experience. IEEE Trans Image Process 27(2):721–734
Article MathSciNet MATH Google Scholar
Cheng E, Burton P, Burton J, Joseski A, Burnett I (2012) Rmit3dv: pre-announcement of a creative commons uncompressed hd 3d video database. In: Fourth international workshop on quality of multimedia experience, IEEE, pp 212–217
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Feng Y, Yiyu C (2017) No-reference image quality assessment through transfer learning. In: 2017 IEEE 2nd international conference on signal and image processing (ICSIP), IEEE, pp 90–94
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Hara K, Kataoka H, Satoh Y (2017) Learning spatio-temporal features with 3d residual networks for action recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3154–3160
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hong W, Yu L (2017) A spatio-temporal perceptual quality index measuring compression distortions of three-dimensional video. IEEE Signal Proc Lett 25(2):214–218
Article Google Scholar
Hou R, Zhao Y, Hu Y, Liu H (2020) No-reference video quality evaluation by a deep transfer cnn architecture. Signal Process Image Commun 83:115782
Article Google Scholar
Huber PJ, Ronchetti EM (2009) Robust statistics, 2nd edn. Wiley, Hoboken, NJ. https://doi.org/10.1002/9780470434697 https://doi.org/10.1002/9780470434697
Book MATH Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Jiang G, Zhou J, Yu M, Zhang Y, Shao F, Peng Z (2015) Binocular vision based objective quality assessment method for stereoscopic images. Multimed Tools Appl 74(18):8197–8218
Article Google Scholar
Jiang G, Liu S, Yu M, Shao F, Peng Z, Chen F (2018) No reference stereo video quality assessment based on motion feature in tensor decomposition domain. J Vis Commun Image Represent 50:247–262
Article Google Scholar
Joveluro P, Malekmohamadi H, Fernando WC, Kondoz A (2010) Perceptual video quality metric for 3d video quality assessment. In: 3DTV-conference: the true vision-capture, transmission and display of 3D video, IEEE, pp 1–4
Kan B, Zhao Y, Wang S (2018) Objective visual comfort evaluation method based on disparity information and motion for stereoscopic video. Opt Express 26(9):11418–11437
Article Google Scholar
Kataoka H, Wakamiya T, Hara K, Satoh Y (2020) Would mega-scale datasets further enhance spatiotemporal 3d cnns? arXiv:2004.04968
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, et al. (2017) The kinetics human action video dataset. arXiv:1705.06950
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar VA, Gupta S, Chandra SS, Raman S, Channappayya SS (2017) No-reference quality assessment of tone mapped high dynamic range (hdr) images using transfer learning. In: 2017 ninth international conference on quality of multimedia experience (QoMEX), IEEE, pp 1–3
Lin Y-H, Wu J-L (2014) Quality assessment of stereoscopic 3d image compression by binocular integration behaviors. IEEE Trans Image Process 23(4):1527–1542
Article MathSciNet MATH Google Scholar
Liu X, Sun C, Yang L T (2015) Dct-based objective quality assessment metric of 2d/3d image. Multimed Tools Appl 74(8):2803–2820
Article Google Scholar
Lu F, Wang H, Ji X, Er G (2009) Quality assessment of 3d asymmetric view coding using spatial frequency dominance model. In: 3DTV conference: the true vision-capture, transmission and display of 3D video, IEEE, pp 1–4
Lu T, Dooms A (2019) A deep transfer learning approach to document image quality assessment. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1372–1377
Ma S, Li S, Xue J, Ding Y, Yue G (2019) Stereoscopic video quality assessment based on the two-step-training binocular fusion network. In: IEEE visual communications and image processing (VCIP), IEEE, pp 1–4
Ma X, Yuan G, Lin S, Li Z, Sun H, Wang Y (2019) Resnet can be pruned 60×: introducing network purification and unused path removal (p-rm) after weight pruning. In: 2019 IEEE/ACM international symposium on nanoscale architectures (NANOARCH), IEEE, pp 1–2
Md S K, Appina B, Channappayya SS (2015) Full-reference stereo image quality assessment using natural stereo scene statistics. IEEE Signal Process Lett 22(11):1985–1989
Article Google Scholar
Mahmood SA, Ghani RF (2015) Objective quality assessment of 3d stereoscopic video based on motion vectors and depth map features. In: 2015 7th computer science and electronic engineering conference (CEEC), IEEE, pp 179–183
Messai O, Hachouf F, Seghir ZA (2018) Deep learning and cyclopean view for no-reference stereoscopic image quality assessment. In: International conference on signal, image, vision and their applications (SIVA), IEEE, pp 1–6
Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Article Google Scholar
Otroshi-Shahreza H, Aamini A, Behroozi H (2018) No-reference image quality assessment using transfer learning. In: 2018 9th international symposium on telecommunications (IST), IEEE, pp 637–640
Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I (2016) Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214:242–268
Article Google Scholar
Qi F, Zhao D, Fan X, Jiang T (2016) Stereoscopic video quality assessment based on visual attention and just-noticeable difference models. SIViP 10 (4):737–744
Article Google Scholar
Sheikh HR, Bovik AC (2005) A visual information fidelity approach to video quality assessment. In: The first international workshop on video processing and quality metrics for consumer electronics, vol. 7, no 2. sn
Smaira L, Carreira J, Noland E, Clancy E, Wu A, Zisserman A (2020) A short note on the kinetics-700-2020 human action dataset. arXiv:2010.10864
Statistics MT (2011) Hollywood: motion picture association of America
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Urvoy M, Barkowsky M, Cousseau R, Koudota Y, Ricorde V, Le Callet P, Gutierrez J, Garcia N (2012) Nama3ds1-cospad1: subjective video quality assessment database on coding conditions introducing freely available high quality 3d stereoscopic sequences. In: Fourth international workshop on quality of multimedia experience, IEEE, pp 109–114
Varga D (2019) No-reference video quality assessment based on the temporal pooling of deep features. Neural Process Lett 50(3):2595–2608
Article Google Scholar
Varga D, Szirányi T. (2019) No-reference video quality assessment via pretrained cnn and lstm networks. SIViP 13(8):1569–1576
Article Google Scholar
Voo KH, Bong DB (2018) Quality assessment of stereoscopic image by 3d structural similarity. Multimed Tools Appl 77(2):2313–2332
Article Google Scholar
VQM Software Available: http://www.its.bldrdoc.gov/n3/video/vqmsoftware.htm. Accessed 3 Mar 2015
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In: The thrity-seventh asilomar conference on signals, systems & computers, 2003, vol 2. IEEE, pp 1398–1402
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Xu X, Shi B, Gu Z, Deng R, Chen X, Krylov AS, Ding Y (2019) 3D no-reference image quality assessment via transfer learning and saliency-guided feature consolidation. IEEE Access 7:85286–85297
Article Google Scholar
Yan Q, Gong D, Zhang Y (2018) Two-stream convolutional networks for blind image quality assessment. IEEE Trans Image Process 28(5):2200–2211
Article MathSciNet Google Scholar
Yang J, Wang H, Lu W, Li B, Badii A, Meng Q (2017) A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain. Inf Sci 414:133–146
Article Google Scholar
Yang J, Ji C, Jiang B, Lu W, Meng Q (2018) No reference quality assessment of stereo video based on saliency and sparsity. IEEE Trans Broadcast 64 (2):341–353
Article Google Scholar
Yang J, Sim K, Gao X, Lu W, Meng Q, Li B (2018) A blind stereoscopic image quality evaluator with segmented stacked autoencoders considering the whole visual perception route. IEEE Trans Image Process 28(3):1314–1328
Article MathSciNet Google Scholar
Yang J, Zhu Y, Ma C, Lu W, Meng Q (2018) Stereoscopic video quality assessment based on 3d convolutional neural networks. Neurocomputing 309:83–93
Article Google Scholar
Yilmaz GN (2015) A no reference depth perception assessment metric for 3d video. Multimed Tools Appl 74(17):6937–6950
Article Google Scholar
You J, Xing L, Perkis A, Wang X (2010) Perceptual quality assessment for stereoscopic images based on 2d image quality metrics and disparity analysis. In: Proc int. workshop video process. quality metrics consum. electron, vol 9. pp 1–6
Zhou W, Chen Z, Li W (2018) Stereoscopic video quality prediction based on end-to-end dual stream deep neural networks. In: Pacific rim conference on multimedia, Springer, pp 482–492
Zhang Y, Gao X, He L, Lu W, He R (2019) Objective video quality assessment combining transfer learning with CNN. IEEE Trans Neural Netw Learn Syst 31(8):2716–2730
Article Google Scholar
Zhang W, Qu C, Ma L, Guan J, Huang R (2016) Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network. Pattern Recogn 59:176–187
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) through the 2232 Outstanding International Researchers Program under Project No. 118C301.

Author information

Authors and Affiliations

Computer Vision Lab, Department of Computer Engineering, Bahcesehir University, Istanbul, Turkey
Hassan Imani, Md Baharul Islam, Masum Shah Junayed, Tarkan Aydin & Nafiz Arica
Department of Computer Science and Engineering, Daffodil International University, Dhaka, 1341, Bangladesh
Md Baharul Islam

Authors

Hassan Imani
View author publications
You can also search for this author inPubMed Google Scholar
Md Baharul Islam
View author publications
You can also search for this author inPubMed Google Scholar
Masum Shah Junayed
View author publications
You can also search for this author inPubMed Google Scholar
Tarkan Aydin
View author publications
You can also search for this author inPubMed Google Scholar
Nafiz Arica
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hassan Imani.

Ethics declarations

This article does not contain any studies with human participants and/or animals performed by any of the authors.

Conflict of Interests

We (authors) certify that there is no actual or potential conflict of interest related to this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imani, H., Islam, M.B., Junayed, M.S. et al. Stereoscopic video quality measurement with fine-tuning 3D ResNets. Multimed Tools Appl 81, 42849–42869 (2022). https://doi.org/10.1007/s11042-022-13485-9

Download citation

Received: 29 June 2021
Revised: 22 October 2021
Accepted: 13 July 2022
Published: 12 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11042-022-13485-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stereoscopic video quality measurement with fine-tuning 3D ResNets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

Stereoscopic Video Quality Assessment Using Modified Parallax Attention Module

Stereoscopic Video Quality Prediction Based on End-to-End Dual Stream Deep Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now