Abstract
Image feature point algorithms and their associated regional descriptors can be viewed as primitive detectors of visually salient information. In this paper, a new method for constructing a visual attention probability map using features is proposed. (Throughout this work, we use SURF features yet the algorithm is not limited to SURF alone.). This technique is validated using comprehensive human eye-tracking experiments. We call this algorithm “visual interest” (VI) since the resultant segmentation reveals image regions that are visually salient during the performance of multiple observer search tasks. We demonstrate that it works on generic, eye-level photographs and is not dependent on heuristic tuning. We further show that the descriptor-matching property of the SURF feature points can be exploited via object recognition to modulate the context of the attention probability map for a given object search task, refining the salient area. We fully validate the VI algorithm through applying it to salient compression using a pre-blur of non-salient regions prior to JPEG and conducting comprehensive observer performance tests. When using the object contextualisation, we conclude that JPEG files are around 33 % larger than they need to be to fully represent the task-relevant information within them. We finally demonstrate the utility of the segmentation as a region of interest in JPEG2000 compression to achieve superior image quality (measured statistically using PSNR and SSIM) over the automatically selected salient image regions while reducing the image filesize by down to 25 % of that of the original. Our technique therefore delivers superior compression performance through the detection and selective preservation of visually salient information relevant to multiple observer tasks. In contrast to the state of the art in task-directed visual attention models, the VI algorithm reacts only to the image content and requires no detailed prior knowledge of the scene nor of the ultimate observer task.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The segmentation produces an “expectation” value of eye-fixation capture. Our parameters capture reliably 70–75 % of cluttered indoor scene fixations and considerably higher than that for outdoor scenes, 85–95 %, even while the task is varied (see Fig. 5).
Such confusions could arise from letters at low resolution including the following sets of confusions: F ⇔ R, W ⇔ M, W ⇔ N, G ⇔ 6, G ⇔ C, B ⇔ 8, V ⇔ Y, X ⇔ A, H ⇔ K, 5 ⇔ 6, F ⇔ P, H ⇔ A, G ⇔ D, O ⇔ D, B ⇔ E, 6 ⇔ 8 and G ⇔ 6.
There is a region of interest (ROI) capability in JPEG2000, but the parameters for the JPEG2000 algorithm are not related to the output quality. In contrast, the JPEG algorithm was designed using data from observer tests: a Q value of 50 is expected to produce good visual quality for photo-real imagery.
Read as: JPEG with quality level, Q = 40.
Based on the kducompress examples of the Kakadu Software Company.
References
Bay H, TuyteFlaars T, Gool LV. Surf: speeded up robust features. Comput Vis Image Underst (CVIU) (CVIU). 2006;110(4):346–59.
Brockmole JR, Castelhano MS, Henderson JM. Contextual cueing in naturalistic scenes: global and local contexts. J Exp Psychol: Learn Mem Cogn. 2006;32(4):699–706.
deCampos TE, Csurka G, Perronnin F. Images as sets of locally weighted features. Technical report VSSP-TR-1/2010, FEPS. University of Surrey, Guildford, UK; 2010.
Fergus R. Visual object category recognition. Ph.D. thesis, University of Oxford; 2005.
Hansen BC, Essock EA. A horizontal bias in human visual processing of orientation and its correspondence to the structural components of natural scenes. J Vis. 2004;4(12):1044–60.
Harding P, Robertson NM. A comparison of feature detectors with passive and task-based visual saliency. LNCS. 2009;5575:716–25.
Harel J, Koch C, Perona P. Graph-based visual saliency. In: Advances in neural information processing systems, vol 19; 2007. p. 545–52.
Hopfinger JB, Buonocore MH, R.Mangun G. The neural mechanisms of top–down attentional control. Nat Neurosci. 2000;3:284–91.
Itti L. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process. 2004;13(10):1304–18.
Itti, L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2(3):194–203.
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transact Patt Anal Mach Intell. 1998;20(11):1254–59.
Kadir T, Brady M. Saliency, scale and image description. Int J Comput Vis. 2001;45(2):83–105.
Lindeberg T. Scale-space theory: A basic tool for analysing structures at different scales. J Appl Stat. 1994;21(2):224–70.
Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis. 2004;60:91–110.
Matas J, Chum O, Urban M, Pajdla T. Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference; 2002. p. 384–93.
Mikolajczyk K, Schmid C. An affine invariant interest point detector. In: 7th European conference on computer vision, vol 1; 2002. p. 128–42.
Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transact Patt Anal Mach Intell. 2005;27(10):1615–30.
Miller JL, Wiltse JM. Resolution requirements for alphanumeric readability. Opt Eng. 2003;42(3):846–52.
Navalpakkam V, Itti L. Modeling the influence of task on attention. Vis Res. 2005;45(2):205–31.
Navalpakkam V, Itti L. Search goal tunes visual features optimally. Neuron. 2007;53(4):605–17.
Pasadena. Dataset web address (accessed 08/2011). http://www.vision.caltech.edu/html-files/archive.html.
Peters R, Itti L. Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR); 2007. p. 1–8.
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2007. p. 1–8. URL: http://www.robots.ox.ac.uk/vgg/data/oxbuildings/index.html.
Rosten E, Drummond T. Fusing points and lines for high performance tracking. In: 10th IEEE international conference on computer vision, vol 2; 2005. p. 1508–11.
Rosten E, Drummond T. Machine learning for high-speed corner detection. In: Proceedings of the 9th European conference on computer vision, vol 1; 2006. p. 430–43.
Taubman D. Kakadu v5.0 survey document; 2001.
Taubman DS, Marcellin MW. JPEG 2000: image compression fundamentals, standards and practice. Norwell: Kluwer; 2001.
Torralba A. Contextual priming for object detection. Int J Comput Vis. 2003;53(2):169–91.
Torralba A, Oliva A, Castelhano M, Henderson J. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113(4):766–86. URL: http://people.csail.mit.edu/torralba/GlobalFeaturesAndAttention/.
Union IT. Reference algorithm for computing peak signal to noise ratio (psnr) of a video sequence with a constant delay. ITU-T standard; 2009.
Viola P, Jones M. Robust real-time object detection. Int J Comput Vis. 2001;57:137–54.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Transact Image Process. 2004;13:600–12.
Wolfe J. Visual attention. In: De Valois KK, editors. Seeing. 2nd ed. San Diego: Academic Press; 2000.
Wolfe JM, Horowitz TS, Kenner N, Hyle M, Vasan N. How fast can you change your mind? the speed of top–down guidance in visual search. Vis Res. 2004;44:1411–26.
Yu S, Lisin D. Image compression based on visual saliency at individual scales. In: Advances in visual computing, LNCS, vol 5875. Springer; 2009. p. 157–66.
Zhai Y, Shah M. Visual attention detection in video sequences using spatiotemporal cues. In: ACM multimedia; 2006. p. 815–24.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Harding, P., Robertson, N.M. Visual Saliency from Image Features with Application to Compression. Cogn Comput 5, 76–98 (2013). https://doi.org/10.1007/s12559-012-9150-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-012-9150-7