Visual Saliency from Image Features with Application to Compression

Harding, P.; Robertson, N. M.

doi:10.1007/s12559-012-9150-7

Visual Saliency from Image Features with Application to Compression

Published: 28 June 2012

Volume 5, pages 76–98, (2013)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

P. Harding¹ &
N. M. Robertson¹

439 Accesses
14 Citations
2 Altmetric
Explore all metrics

Abstract

Image feature point algorithms and their associated regional descriptors can be viewed as primitive detectors of visually salient information. In this paper, a new method for constructing a visual attention probability map using features is proposed. (Throughout this work, we use SURF features yet the algorithm is not limited to SURF alone.). This technique is validated using comprehensive human eye-tracking experiments. We call this algorithm “visual interest” (VI) since the resultant segmentation reveals image regions that are visually salient during the performance of multiple observer search tasks. We demonstrate that it works on generic, eye-level photographs and is not dependent on heuristic tuning. We further show that the descriptor-matching property of the SURF feature points can be exploited via object recognition to modulate the context of the attention probability map for a given object search task, refining the salient area. We fully validate the VI algorithm through applying it to salient compression using a pre-blur of non-salient regions prior to JPEG and conducting comprehensive observer performance tests. When using the object contextualisation, we conclude that JPEG files are around 33 % larger than they need to be to fully represent the task-relevant information within them. We finally demonstrate the utility of the segmentation as a region of interest in JPEG2000 compression to achieve superior image quality (measured statistically using PSNR and SSIM) over the automatically selected salient image regions while reducing the image filesize by down to 25 % of that of the original. Our technique therefore delivers superior compression performance through the detection and selective preservation of visually salient information relevant to multiple observer tasks. In contrast to the state of the art in task-directed visual attention models, the VI algorithm reacts only to the image content and requires no detailed prior knowledge of the scene nor of the ultimate observer task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

The segmentation produces an “expectation” value of eye-fixation capture. Our parameters capture reliably 70–75 % of cluttered indoor scene fixations and considerably higher than that for outdoor scenes, 85–95 %, even while the task is varied (see Fig. 5).
Such confusions could arise from letters at low resolution including the following sets of confusions: F ⇔ R, W ⇔ M, W ⇔ N, G ⇔ 6, G ⇔ C, B ⇔ 8, V ⇔ Y, X ⇔ A, H ⇔ K, 5 ⇔ 6, F ⇔ P, H ⇔ A, G ⇔ D, O ⇔ D, B ⇔ E, 6 ⇔ 8 and G ⇔ 6.
There is a region of interest (ROI) capability in JPEG2000, but the parameters for the JPEG2000 algorithm are not related to the output quality. In contrast, the JPEG algorithm was designed using data from observer tests: a Q value of 50 is expected to produce good visual quality for photo-real imagery.
Read as: JPEG with quality level, Q = 40.
Based on the kdu_compress examples of the Kakadu Software Company.

References

Bay H, TuyteFlaars T, Gool LV. Surf: speeded up robust features. Comput Vis Image Underst (CVIU) (CVIU). 2006;110(4):346–59.
Google Scholar
Brockmole JR, Castelhano MS, Henderson JM. Contextual cueing in naturalistic scenes: global and local contexts. J Exp Psychol: Learn Mem Cogn. 2006;32(4):699–706.
Article Google Scholar
deCampos TE, Csurka G, Perronnin F. Images as sets of locally weighted features. Technical report VSSP-TR-1/2010, FEPS. University of Surrey, Guildford, UK; 2010.
Fergus R. Visual object category recognition. Ph.D. thesis, University of Oxford; 2005.
Hansen BC, Essock EA. A horizontal bias in human visual processing of orientation and its correspondence to the structural components of natural scenes. J Vis. 2004;4(12):1044–60.
Article PubMed Google Scholar
Harding P, Robertson NM. A comparison of feature detectors with passive and task-based visual saliency. LNCS. 2009;5575:716–25.
Google Scholar
Harel J, Koch C, Perona P. Graph-based visual saliency. In: Advances in neural information processing systems, vol 19; 2007. p. 545–52.
Hopfinger JB, Buonocore MH, R.Mangun G. The neural mechanisms of top–down attentional control. Nat Neurosci. 2000;3:284–91.
Article PubMed CAS Google Scholar
Itti L. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process. 2004;13(10):1304–18.
Article PubMed Google Scholar
Itti, L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2(3):194–203.
Article PubMed CAS Google Scholar
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transact Patt Anal Mach Intell. 1998;20(11):1254–59.
Article Google Scholar
Kadir T, Brady M. Saliency, scale and image description. Int J Comput Vis. 2001;45(2):83–105.
Article Google Scholar
Lindeberg T. Scale-space theory: A basic tool for analysing structures at different scales. J Appl Stat. 1994;21(2):224–70.
Google Scholar
Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis. 2004;60:91–110.
Article Google Scholar
Matas J, Chum O, Urban M, Pajdla T. Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference; 2002. p. 384–93.
Mikolajczyk K, Schmid C. An affine invariant interest point detector. In: 7th European conference on computer vision, vol 1; 2002. p. 128–42.
Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transact Patt Anal Mach Intell. 2005;27(10):1615–30.
Article Google Scholar
Miller JL, Wiltse JM. Resolution requirements for alphanumeric readability. Opt Eng. 2003;42(3):846–52.
Article Google Scholar
Navalpakkam V, Itti L. Modeling the influence of task on attention. Vis Res. 2005;45(2):205–31.
Article PubMed Google Scholar
Navalpakkam V, Itti L. Search goal tunes visual features optimally. Neuron. 2007;53(4):605–17.
Article PubMed CAS Google Scholar
Pasadena. Dataset web address (accessed 08/2011). http://www.vision.caltech.edu/html-files/archive.html.
Peters R, Itti L. Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR); 2007. p. 1–8.
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2007. p. 1–8. URL: http://www.robots.ox.ac.uk/vgg/data/oxbuildings/index.html.
Rosten E, Drummond T. Fusing points and lines for high performance tracking. In: 10th IEEE international conference on computer vision, vol 2; 2005. p. 1508–11.
Rosten E, Drummond T. Machine learning for high-speed corner detection. In: Proceedings of the 9th European conference on computer vision, vol 1; 2006. p. 430–43.
Taubman D. Kakadu v5.0 survey document; 2001.
Taubman DS, Marcellin MW. JPEG 2000: image compression fundamentals, standards and practice. Norwell: Kluwer; 2001.
Google Scholar
Torralba A. Contextual priming for object detection. Int J Comput Vis. 2003;53(2):169–91.
Article Google Scholar
Torralba A, Oliva A, Castelhano M, Henderson J. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113(4):766–86. URL: http://people.csail.mit.edu/torralba/GlobalFeaturesAndAttention/.
Union IT. Reference algorithm for computing peak signal to noise ratio (psnr) of a video sequence with a constant delay. ITU-T standard; 2009.
Viola P, Jones M. Robust real-time object detection. Int J Comput Vis. 2001;57:137–54.
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Transact Image Process. 2004;13:600–12.
Article Google Scholar
Wolfe J. Visual attention. In: De Valois KK, editors. Seeing. 2nd ed. San Diego: Academic Press; 2000.
Google Scholar
Wolfe JM, Horowitz TS, Kenner N, Hyle M, Vasan N. How fast can you change your mind? the speed of top–down guidance in visual search. Vis Res. 2004;44:1411–26.
Article PubMed Google Scholar
Yu S, Lisin D. Image compression based on visual saliency at individual scales. In: Advances in visual computing, LNCS, vol 5875. Springer; 2009. p. 157–66.
Zhai Y, Shah M. Visual attention detection in video sequences using spatiotemporal cues. In: ACM multimedia; 2006. p. 815–24.

Download references

Author information

Authors and Affiliations

Heriot-Watt University, Edinburgh, UK
P. Harding & N. M. Robertson

Authors

P. Harding
View author publications
You can also search for this author inPubMed Google Scholar
N. M. Robertson
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to P. Harding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harding, P., Robertson, N.M. Visual Saliency from Image Features with Application to Compression. Cogn Comput 5, 76–98 (2013). https://doi.org/10.1007/s12559-012-9150-7

Download citation

Received: 30 December 2011
Accepted: 19 May 2012
Published: 28 June 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s12559-012-9150-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Saliency from Image Features with Application to Compression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

Compressed-domain visual saliency models: a comparative study

Coalesced global and local feature discrimination for content-based image retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Visual Saliency from Image Features with Application to Compression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

Compressed-domain visual saliency models: a comparative study

Coalesced global and local feature discrimination for content-based image retrieval

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now