Improved performance of quality metrics using saliency map and CSF filter for standard coding H264/AVC

Ben Amor, Mohamed; Kammoun, Fahmi; Masmoudi, Nouri

doi:10.1007/s11042-017-5393-3

Improved performance of quality metrics using saliency map and CSF filter for standard coding H264/AVC

Published: 14 November 2017

Volume 77, pages 19377–19397, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

549 Accesses
4 Citations
Explore all metrics

Abstract

Despite the great evolution in the field of quality metrics, classical tools such as the PSNR remain useful in the field of evolution of video coding research. Our idea is to improve the limited performance of these tools using simple and well known techniques. In the quality measurement of images and video sequences, the use of the characteristics of human visual systems becomes a priority in order to have a better correlation between the objective measurement and the judgment of the observers. For this, we try to improve the performance of the two existing metrics (PSNR “Peak Signal / Noise Ratio”, DVQ “Digital Video Quality”) for the H.264 / MPEG-4 (Motion Picture Expert Group) AVC (Advanced Video Coding). This improvement is carried out by extracting the zones of interest using a saliency map and using a filtering by the contrast sensitivity function CSF. We did a study using two types of saliency map SURF (Speeded Up Robust Features) and HARRIS saliency map. We use the subjective video database “LIVE” to test the performance of our proposed idea. Performance indicators, namely Pearson (PLCC), the Spearman coefficient (SROCC) and mean squared prediction error (RMSE) indicate that SURF give best results for the distortion produced by H264 video encoding then HARRIS. Our idea is proven especially in short distances of observation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust spatio-temporal saliency estimation method for H.264 compressed videos

Article 27 April 2022

A Novel and Efficient Spatial–Temporal Saliency-Driven Integrated Video Compression

Article 24 February 2024

Fast Coding Strategy for HEVC by Motion Features and Saliency Applied on Difference Between Successive Image Blocks

References

Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: Computer vision and pattern recognition, 2009. cvpr 2009. IEEE Conference on. IEEE, p 1597–1604
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359
Article Google Scholar
Ben Amor M, Samet A, Kammoun F, Masmoudi N (2010) Exploitation des caractéristiques du système visuel humain dans les métriques de qualité. Cinquième workshop AMINA, In, pp 123–130
Google Scholar
Ben Amor M, Kammoun F, Masmoudi N (2012) A new quality metric based on FFT transform. Int J Comput Appl IJCA 40(2):41–46
Google Scholar
Ben Amor M, Larabi MC, Kammoun F, Masmoudi N (2014) A block artifact distortion measure for no reference video quality evaluation. In: Image Processing, Applications and Systems Conference (IPAS), 2014 First International. IEEE, p 1–5
Ben Amor M, Larabi MC, Kammoun F, Masmoudi N (2016) A no reference quality metric to measure the blocking artefacts for video sequences. Imaging Sci J 64(7):408–417
Article Google Scholar
Ben Amor M, Kammoun F, Masmoudi N (2016) A pretreatment to improve the quality metrics performance for encoding H264/AVC. J Soc Inf Disp 24(3):187–197
Article Google Scholar
Ben AM, Larabi MC, Kammoun F, Masmoudi N (2014) A perceptual measure of blocking artifact for no-reference video quality evaluation of H. 264 codec. J Test Eval 43(6):1247–1257
Google Scholar
Chandler DM, Hemami SS (2007) VSNR: a wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans Image Process 16(9):2284–2298
Article MathSciNet Google Scholar
Chen J, Zhang Y, Liang L, Ma S, Wang R, Gao W (2008) A no-reference blocking artifacts metric using selective gradient and plainness measures. In: Advances in Multimedia Information Processing-PCM 2008, p 894–897
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43(4):996–1002
Article Google Scholar
Derpanis KG (2004) The harris corner detector. York University, Toronto
Google Scholar
Frintrop S, Rome E, Christensen HI (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept (TAP) 7(1):6
Google Scholar
Geng Y, Deng H (2013) Modeling the effect of human body on TOA ranging for indoor human tracking with wrist mounted sensor. Wireless Personal Multimedia Communications (WPMC), 2013 16th International Symposium on IEEE:1–6
Girod B (1993) What’s wrong with mean-squared error. In: Digital images and human vision. MIT press, Cambridge, pp 207–220
Harel J, Koch C, Perona P (2007) Graph based visual saliency. In: Proceedings of the advances in neural information and processing systems. MIT Press, Cambridge, pp 545–552
Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey vision conference, Vol. 15, No. 50. p 10–5244
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Itti L, Rees G, Tsotsos JK (eds) (2005) Neurobiology of attention. Academic Press, Cambridge
Google Scholar
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: Computer Vision, 2009 I.E. 12th international conference on IEEE, p 2106–2113
Koch C, Ullman S (1987) Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence. Springer, Netherlands, pp 115–141
Leutenegger S, Chli M, Siegwart RY (2011) BRISK: binary robust invariant scalable keypoints. Computer Vision (ICCV), 2011 I.E. International Conference on IEEE:2548–2555
Liu Y, Xie H (2009) Constructing SURF visual-words for pornographic images detection. In: Computers and information technology, 2009. ICCIT'09. 12th international conference on. IEEE, pp 404–407
Google Scholar
Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequences. In: Virtual Systems and Multimedia (VSMM), 2010 16th International Conference on. IEEE, p 26–33
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, p 898–901
Liu X, Song M, Tao D, Liu Z, Zhang L, Chen C, Bu J (2013) Semi-supervised node splitting for random forest construction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 492–499
Google Scholar
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2Activity: recognizing complex activities from sensor data. In: Proceedings of the international joint conference on artificial intelligence. Buenos Aires, Argentina pp 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Article Google Scholar
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In AAAI, vol 30. p 1266–1272
Liu Y, Zhang L, Nie L et al (2016) Fortune teller: predicting your career path. In: AAAI. Phoenix, pp 201–207
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum S (2016) Urban water quality prediction based on multi-task multi-view learning. Proceedings of the international joint conference on artificial intelligence. New York, pp 2576–2581
Liu Y, Liang Y, Liu S, Rosenblum DS, Zheng Y (2016) Predicting urban water quality with ubiquitous data. CoRR arXiv:1610.09462
Lowe DG (1999) Object recognition from local scale-invariant features. In: computer vision, 1999. The proceedings of the seventh IEEE international conference on, vol 2. IEEE:1150–1157
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Applications 76(8):10701–10719
Article Google Scholar
Ma L, Li S, Ngan KN (2013) Reduced-reference image quality assessment in reorganized DCT domain. Signal Process Image Commun 28(8):884–902
Article Google Scholar
Mannos J, Sakrison D (1974) The effects of a visual fidelity criterion of the encoding of images. IEEE Trans Inf Theory 20(4):525–536
Article MATH Google Scholar
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
Article Google Scholar
Moravec HP (1977) Towards automatic visual bbstacle avoidance. In: International Conference on Artificial Intelligence (5th: 1977: Massachusetts Institute of Technology)
Moravec HP (1979) Visual mapping by a robot rover. Proceedings of the 6th international joint conference on Artificial intelligence-Volume 1 Morgan Kaufmann Publishers Inc:598–600
Nauge M, Larabi MC, Fernandez-Maloigne C (2012) A statistical study of the correlation between interest points and gaze points, Human Vision and Electronic Imaging, p 829111
Ngan KN, Leong KS, Singh H (1986) Cosine transform coding incorporating human visual system model. In: Cambridge symposium-fiber/LASE'86. International Society for Optics and Photonics, p 165–171
Nill N (1985) A visual model weighted cosine transform for image compression and quality assessment. IEEE Trans Commun 33(6):551–557
Article Google Scholar
Pinson MH, Wolf S (2004) A new standardized method for objectively measuring video quality. IEEE Trans Broadcast 50(3):312–322
Article Google Scholar
Preoţiuc-Pietro D, Liu Y, Hopkins D, Ungar L (2017) Beyond binary labels: political ideology prediction of Twitter users. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1. p 729–740
Redi JA, Gastaldo P, Heynderickx I, Zunino R (2010) Color distribution information for the reduced-reference assessment of perceived image quality. IEEE Trans Circuits Syst Video Technol 20(12):1757–1769
Article Google Scholar
Rosselli V, Larabi MC, Fernandez-Maloigne C (2007) Métrique de différence couleur basée sur le seuil de perception on COROSA COmpression et REprésentation des Signaux Audiovisuels. Montpellier, pp 8–9
Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32(1):105–119
Article Google Scholar
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. Computer Vision (ICCV), 2011 I.E. international conference on IEEE:2564–2571
Sadaka NG, Karam LJ, Ferzli R, Abousleman GP (2008) A no-reference perceptual image sharpness metric based on saliency-weighted foveal pooling. In: Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on. IEEE, p 369–372
Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350
Article MathSciNet MATH Google Scholar
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Article MathSciNet MATH Google Scholar
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) A subjective study to evaluate video quality assessment algorithms. In: Human Vision and Electronic Imaging, vol 7527:75270
MATH Google Scholar
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444
Article Google Scholar
Sheikh HR, Bovik AC, Cormack L (2005) No-reference quality assessment using natural scene statistics: JPEG2000. IEEE Trans Image Process 14(11):1918–1927
Article Google Scholar
Sheikh HR, Sabir MF, Bovik AC (2006) A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451
Article Google Scholar
Stoica A, LARABI MC, Fernandez-Maloigne C (2004) 6-Amélioration de la qualité visuelle d'images couleur dans le cadre du standard de compression JPEG2000. Traitement du Signal, Lavoisier 21(6):661–677.
Suthaharan S (2009) No-reference visually significant blocking artifact metric for natural scene images. Signal Process 89(8):1647–1652
Article MATH Google Scholar
Tagliasacchi M, Valenzise G, Naccari M, Tubaro S (2010) A reduced-reference structural similarity approximation for videos corrupted by channel errors. Multimed Tools Appl 48(3):471–492
Article Google Scholar
Haglund L (2001) “SVT video test sequence,” SVT sveriges television AB. [Online]. Available: ftp://ftp.ldv.e-technik.tu-muenchen.de/pub/test sequences/
Veeraswamy K, Srinivaskumar S, Chatterji BN (2007) Designing quantization table for hadamard transform based on human visual system for image compression. ICGST-GVIP Journal 7(3):31–38
Google Scholar
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. Signals, Systems and Computers, 2004 Conference Record of the Thirty-Seventh Asilomar Conference on, vol 2 IEEE:1398–1402
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Wang W, Yan Y, Zhang L, Hong R, Sebe N (2016) Collaborative sparse coding for multiview action recognition. IEEE MultiMedia 23(4):80–87
Article Google Scholar
Watson AB (1998) Toward a perceptual video-quality metric. In: Photonics west’98 electronic imaging, international society for optics and photonics. San Jose, pp 139–147
Watson AB, Hu J, McGowan JF (2001) Digital video quality metric based on human vision. Journal of Electronic imaging 10(1):20–29
Article Google Scholar
Wolf S, Pinson MH (2005) Low bandwidth reduced reference video quality monitoring system. In: Proceedings of the international workshop video processing and quality metrics for consumer electronics. Scottsdale, p 23–25
Xiao F (2000) DCT-based video quality evaluation. Technical report, MSU graphics and media lab (video group). http://compression.ru/video/quality_measure/vqm
Zhang L, Zhang L, Mou X, Zhang D (2011) FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process 20(8):2378–2386
Article MathSciNet MATH Google Scholar
Zhang L, Song M, Liu Z, Liu X, Bu J, Chen C (2013) Probabilistic graphlet cut: exploiting spatial structure cue for weakly supervised image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. p 1908–1915
Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084
Article MathSciNet MATH Google Scholar
Zhang X, Feng X, Wang W, Xue W (2013) Edge strength similarity for image quality assessment. IEEE Signal processing letters 20(4):319–322
Article Google Scholar
Zhang L, Yang Y, Gao Y, Yu Y, Wang C, Li X (2014) A probabilistic associative model for segmenting weakly supervised images. IEEE Trans Image Process 23(9):4150–4159
Article MathSciNet MATH Google Scholar
Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimedia 16(2):470–479
Article Google Scholar
Zhang L, Song M, Yang Y, Zhao Q, Zhao C, Sebe N (2014) Weakly supervised photo cropping. IEEE Transactions on Multimedia 16(1):94–107
Article Google Scholar
Zhang L, Gao Y, Ji R, Xia Y, Dai Q, Li X (2014) Actively learning human gaze shifting paths for semantics-aware photo cropping. IEEE Trans Image Process 23(5):2235–2245
Article MathSciNet MATH Google Scholar
Zhang L, Gao Y, Zimmermann R, Tian Q, Li X (2014) Fusion of multichannel local and global structural cues for photo aesthetics evaluation. IEEE Trans Image Process 23(3):1419–1429
Article MathSciNet MATH Google Scholar
Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE transactions on cybernetics 44(8):1408–1419
Article Google Scholar
Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571
Article Google Scholar
Zhang X, Wang S, Ma S, Gao W (2015) A study on interest point guided visual saliency. In picture coding symposium (PCS), 2015. IEEE:307–311
Zhang, L., Li, X., Nie, L., Yan, Y., & Zimmermann, R. (2016). Semantic photo retargeting under noisy image labels. ACM Trans Multimed Comput Commun Appl (TOMM), 12(3): 37

Download references

Author information

Authors and Affiliations

Laboratory of Electronics and Information Technologies, National School of Engineering, Sfax, University of Sfax, Sfax, Tunisia
Mohamed Ben Amor, Fahmi Kammoun & Nouri Masmoudi

Authors

Mohamed Ben Amor
View author publications
You can also search for this author in PubMed Google Scholar
Fahmi Kammoun
View author publications
You can also search for this author in PubMed Google Scholar
Nouri Masmoudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Ben Amor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Amor, M., Kammoun, F. & Masmoudi, N. Improved performance of quality metrics using saliency map and CSF filter for standard coding H264/AVC. Multimed Tools Appl 77, 19377–19397 (2018). https://doi.org/10.1007/s11042-017-5393-3

Download citation

Received: 23 February 2017
Revised: 27 September 2017
Accepted: 05 November 2017
Published: 14 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11042-017-5393-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved performance of quality metrics using saliency map and CSF filter for standard coding H264/AVC

Abstract

Access this article

Similar content being viewed by others

Robust spatio-temporal saliency estimation method for H.264 compressed videos

A Novel and Efficient Spatial–Temporal Saliency-Driven Integrated Video Compression

Fast Coding Strategy for HEVC by Motion Features and Saliency Applied on Difference Between Successive Image Blocks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved performance of quality metrics using saliency map and CSF filter for standard coding H264/AVC

Abstract

Access this article

Similar content being viewed by others

Robust spatio-temporal saliency estimation method for H.264 compressed videos

A Novel and Efficient Spatial–Temporal Saliency-Driven Integrated Video Compression

Fast Coding Strategy for HEVC by Motion Features and Saliency Applied on Difference Between Successive Image Blocks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation