Abstract
Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall be annotated in the images. Such region-based annotations can be used in various ways like similarity search or as training set in automatic object detection. We investigate the principle idea of finding objects in images by looking at gaze paths from users, viewing images with an interest in a specific object. We have analyzed 799 gaze paths from 30 subjects viewing image-tag-pairs with the task to decide whether a tag could be found in the image or not. We have compared 13 different fixation measures analyzing the gaze paths. The best performing fixation measure is able to correctly assign a tag to a region for 63 % of the image-tag-pairs and significantly outperforms three baselines. We look into details of the image region characteristics such as the position and size for incorrect and correct assignments. The influence of aggregating multiple gaze paths from several subjects with respect to improving the precision of identifying the correct regions is also investigated. In addition, we look into the possibilities of discriminating different regions in the same image. Here, we are able to correctly identify two regions in the same image from different primings with an accuracy of 38 %.
















Similar content being viewed by others
References
Bruneau D, Sasse M, McCarthy J (2002) The eyes never lie: The use of eye tracking data in HCI research. In: Proceedings of the CHI, vol 2
Campbell RJ, Flynn PJ (2001) A survey of free-form object representation and recognition techniques. Comput Vis Image Underst 81(2):166–210
Castagnos S, Jones N, Pu P (2010) Eye-tracking product recommenders’ usage. In: Proceedings of the 4th ACM conference on recommender systems. ACM, pp 29–36
Duygulu P, Barnard K, De Freitas J, Forsyth D (2006) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Computer vision, ECCV 2002, pp 349–354
Grabner H, Gall J, Van Gool L (2011) What makes a chair a chair? In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1529–1536
Hajimirza S, Izquierdo E (2010) Gaze movement inference for implicit image annotation. In: Image analysis for multimedia interactive services. IEEE
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Jaimes A (2001) Using human observer eye movements in automatic image classifiers. In: SPIE. ISSN 0277786X. doi:10.1117/12.429507
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: IEEE international conference on computer vision (ICCV). Citeseer
Kim D, Yu S (2008) A new region filtering and region weighting approach to relevance feedback in content-based image retrieval. J Syst Softw 81(9):1525–1538
Klami A (2010) Inferring task-relevant image regions from gaze data. In: Workshop on machine learning for signal processing. IEEE
Klami A, Saunders C, De Campos T, Kaski S (2008) Can relevance of images be inferred from eye movements? In: Multimedia information retrieval. ACM
Kompatsiaris I, Triantafyllou E, Strintzis M (2001) A World Wide Web region-based image search engine. In: Conference on image analysis and processing. doi:10.1109/ICIAP.2001.957041
Kozma L, Klami A, Kaski S (2009) GaZIR: gaze-based zooming interface for image retrieval. In: Multimodal interfaces. ACM
Li X, Snoek CGM, Worring M (2009) Annotating images by harnessing worldwide user-tagged photos. In: Acoustics, speech, and signal processing. IEEE, pp 3717–3720
Liu X, Cheng B, Yan S, Tang J, Chua T, Jin H (2009) Label to region by bi-layer sparsity priors. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 115–124
Navalpakkam V, Itti L (2005) Modeling the influence of task on attention. Vis Res 45(2):205–231
Pasupa K, Saunders C, Szedmak S, Klami A, Kaski S, Gunn S (2009) Learning to rank images from eye movements. In: IEEE 12th International conference on computer vision workshops, (ICCV Workshops ’09)
Privitera CM, Stark LW (2000) Algorithms for defining visual regions-of-interest: comparison with eye fixations. IEEE Trans Pattern Anal Mach Intell 22(9):970–982
Ramanathan S, Katti H, Huang R, Chua T-S, Kankanhalli M (2009) Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis. In: Multimedia. ACM. New York, USA. ISBN 9781605586083. doi:10.1145/1631272.1631399
Ramanathan S, Katti H, Sebe N, Kankanhalli M, Chua T (2010) An eye fixation database for saliency detection in images. In: Computer vision–ECCV 2010, pp 30–43
Rowe N (2002) Finding and labeling the subject of a captioned depictive natural photograph. IEEE Trans Knowl Data Eng 14(1):202–207. ISSN 1041-4347
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision 77(1):157–173
Santella A, Agrawala M, DeCarlo D, Salesin D, Cohen M (2006) Gaze-based interaction for semi-automatic photo cropping. In: CHI. ACM, pp 780
Schneiderman H, Kanade T (2000) A statistical method for 3d object detection applied to faces and cars. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1. IEEE, pp 746–751
Sewell W, Komogortsev O (2010) Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network. In: Proceedings of the 28th of the international conference extended abstracts on human factors in computing systems. ACM, pp 3739–3744
Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 223–232
Torralba A, Murphy K, Freeman W (2007) Sharing visual features for multiclass and multiview object detection. IEEE Trans Pattern Anal Mach Intell 29(5):854–869
Tsai D, Jing Y, Liu Y, Rowley H, Ioffe S, Rehg J (2011) Large-scale image annotation using visual synset. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 611–618
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on computer vision and pattern recognition, CVPR 2001, vol 1. IEEE, pp 511–518
von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: CHI. ACM, 2006. ISBN 1-59593-372-7
Walber T, Scherp A, Staab S (2012) Identifying objects in images from analyzing the users gaze movements for provided tags. In: Advances in multimedia modeling. Springer, pp 138–148
Walber T, Scherp A, Staab S (2013) Can you see it? two novel eye-tracking-based measures for assigning tags to image regions. In: Advances in multimedia modeling. Springer, pp 36–46
Yarbus A (1967) Eye movements and vision. Plenum press
Zhao Q, Koch C (2011) Learning a saliency map using fixated locations in natural scenes. J Vis 11(3):1–15
Acknowledgement
We thank the subjects participating in our experiment. The research leading to this article was partially supported by the EU project SocialSensor (FP7-287975).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Walber, T., Scherp, A. & Staab, S. Benefiting from users’ gaze: selection of image regions from eye tracking information for provided tags. Multimed Tools Appl 71, 363–390 (2014). https://doi.org/10.1007/s11042-013-1390-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1390-3