Skip to main content

Advertisement

Log in

Visual Saliency from Image Features with Application to Compression

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Image feature point algorithms and their associated regional descriptors can be viewed as primitive detectors of visually salient information. In this paper, a new method for constructing a visual attention probability map using features is proposed. (Throughout this work, we use SURF features yet the algorithm is not limited to SURF alone.). This technique is validated using comprehensive human eye-tracking experiments. We call this algorithm “visual interest” (VI) since the resultant segmentation reveals image regions that are visually salient during the performance of multiple observer search tasks. We demonstrate that it works on generic, eye-level photographs and is not dependent on heuristic tuning. We further show that the descriptor-matching property of the SURF feature points can be exploited via object recognition to modulate the context of the attention probability map for a given object search task, refining the salient area. We fully validate the VI algorithm through applying it to salient compression using a pre-blur of non-salient regions prior to JPEG and conducting comprehensive observer performance tests. When using the object contextualisation, we conclude that JPEG files are around 33 % larger than they need to be to fully represent the task-relevant information within them. We finally demonstrate the utility of the segmentation as a region of interest in JPEG2000 compression to achieve superior image quality (measured statistically using PSNR and SSIM) over the automatically selected salient image regions while reducing the image filesize by down to 25 % of that of the original. Our technique therefore delivers superior compression performance through the detection and selective preservation of visually salient information relevant to multiple observer tasks. In contrast to the state of the art in task-directed visual attention models, the VI algorithm reacts only to the image content and requires no detailed prior knowledge of the scene nor of the ultimate observer task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. The segmentation produces an “expectation” value of eye-fixation capture. Our parameters capture reliably 70–75 % of cluttered indoor scene fixations and considerably higher than that for outdoor scenes, 85–95 %, even while the task is varied (see Fig. 5).

  2. Such confusions could arise from letters at low resolution including the following sets of confusions: FR,   WM,   WN,   G ⇔ 6,   GC,   B ⇔ 8,   VY,   XA,   HK,   5 ⇔ 6,   FP,   HA,   GD,   OD,   BE,   6 ⇔ 8 and G ⇔ 6.

  3. There is a region of interest (ROI) capability in JPEG2000, but the parameters for the JPEG2000 algorithm are not related to the output quality. In contrast, the JPEG algorithm was designed using data from observer tests: a Q value of 50 is expected to produce good visual quality for photo-real imagery.

  4. Read as: JPEG with quality level, Q = 40.

  5. Based on the kducompress examples of the Kakadu Software Company.

References

  1. Bay H, TuyteFlaars T, Gool LV. Surf: speeded up robust features. Comput Vis Image Underst (CVIU) (CVIU). 2006;110(4):346–59.

    Google Scholar 

  2. Brockmole JR, Castelhano MS, Henderson JM. Contextual cueing in naturalistic scenes: global and local contexts. J Exp Psychol: Learn Mem Cogn. 2006;32(4):699–706.

    Article  Google Scholar 

  3. deCampos TE, Csurka G, Perronnin F. Images as sets of locally weighted features. Technical report VSSP-TR-1/2010, FEPS. University of Surrey, Guildford, UK; 2010.

  4. Fergus R. Visual object category recognition. Ph.D. thesis, University of Oxford; 2005.

  5. Hansen BC, Essock EA. A horizontal bias in human visual processing of orientation and its correspondence to the structural components of natural scenes. J Vis. 2004;4(12):1044–60.

    Article  PubMed  Google Scholar 

  6. Harding P, Robertson NM. A comparison of feature detectors with passive and task-based visual saliency. LNCS. 2009;5575:716–25.

    Google Scholar 

  7. Harel J, Koch C, Perona P. Graph-based visual saliency. In: Advances in neural information processing systems, vol 19; 2007. p. 545–52.

  8. Hopfinger JB, Buonocore MH, R.Mangun G. The neural mechanisms of top–down attentional control. Nat Neurosci. 2000;3:284–91.

    Article  PubMed  CAS  Google Scholar 

  9. Itti L. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process. 2004;13(10):1304–18.

    Article  PubMed  Google Scholar 

  10. Itti, L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2(3):194–203.

    Article  PubMed  CAS  Google Scholar 

  11. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transact Patt Anal Mach Intell. 1998;20(11):1254–59.

    Article  Google Scholar 

  12. Kadir T, Brady M. Saliency, scale and image description. Int J Comput Vis. 2001;45(2):83–105.

    Article  Google Scholar 

  13. Lindeberg T. Scale-space theory: A basic tool for analysing structures at different scales. J Appl Stat. 1994;21(2):224–70.

    Google Scholar 

  14. Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis. 2004;60:91–110.

    Article  Google Scholar 

  15. Matas J, Chum O, Urban M, Pajdla T. Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference; 2002. p. 384–93.

  16. Mikolajczyk K, Schmid C. An affine invariant interest point detector. In: 7th European conference on computer vision, vol 1; 2002. p. 128–42.

  17. Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transact Patt Anal Mach Intell. 2005;27(10):1615–30.

    Article  Google Scholar 

  18. Miller JL, Wiltse JM. Resolution requirements for alphanumeric readability. Opt Eng. 2003;42(3):846–52.

    Article  Google Scholar 

  19. Navalpakkam V, Itti L. Modeling the influence of task on attention. Vis Res. 2005;45(2):205–31.

    Article  PubMed  Google Scholar 

  20. Navalpakkam V, Itti L. Search goal tunes visual features optimally. Neuron. 2007;53(4):605–17.

    Article  PubMed  CAS  Google Scholar 

  21. Pasadena. Dataset web address (accessed 08/2011). http://www.vision.caltech.edu/html-files/archive.html.

  22. Peters R, Itti L. Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR); 2007. p. 1–8.

  23. Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2007. p. 1–8. URL: http://www.robots.ox.ac.uk/vgg/data/oxbuildings/index.html.

  24. Rosten E, Drummond T. Fusing points and lines for high performance tracking. In: 10th IEEE international conference on computer vision, vol 2; 2005. p. 1508–11.

  25. Rosten E, Drummond T. Machine learning for high-speed corner detection. In: Proceedings of the 9th European conference on computer vision, vol 1; 2006. p. 430–43.

  26. Taubman D. Kakadu v5.0 survey document; 2001.

  27. Taubman DS, Marcellin MW. JPEG 2000: image compression fundamentals, standards and practice. Norwell: Kluwer; 2001.

    Google Scholar 

  28. Torralba A. Contextual priming for object detection. Int J Comput Vis. 2003;53(2):169–91.

    Article  Google Scholar 

  29. Torralba A, Oliva A, Castelhano M, Henderson J. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113(4):766–86. URL: http://people.csail.mit.edu/torralba/GlobalFeaturesAndAttention/.

  30. Union IT. Reference algorithm for computing peak signal to noise ratio (psnr) of a video sequence with a constant delay. ITU-T standard; 2009.

  31. Viola P, Jones M. Robust real-time object detection. Int J Comput Vis. 2001;57:137–54.

    Article  Google Scholar 

  32. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Transact Image Process. 2004;13:600–12.

    Article  Google Scholar 

  33. Wolfe J. Visual attention. In: De Valois KK, editors. Seeing. 2nd ed. San Diego: Academic Press; 2000.

    Google Scholar 

  34. Wolfe JM, Horowitz TS, Kenner N, Hyle M, Vasan N. How fast can you change your mind? the speed of top–down guidance in visual search. Vis Res. 2004;44:1411–26.

    Article  PubMed  Google Scholar 

  35. Yu S, Lisin D. Image compression based on visual saliency at individual scales. In: Advances in visual computing, LNCS, vol 5875. Springer; 2009. p. 157–66.

  36. Zhai Y, Shah M. Visual attention detection in video sequences using spatiotemporal cues. In: ACM multimedia; 2006. p. 815–24.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Harding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harding, P., Robertson, N.M. Visual Saliency from Image Features with Application to Compression. Cogn Comput 5, 76–98 (2013). https://doi.org/10.1007/s12559-012-9150-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-012-9150-7

Keywords