Abstract
We describe a setup and experiments where users are checking and correcting image tags given by an automatic tagging system. We study how much the application of a content-based image retrieval (CBIR) method speeds up the process of finding and correcting the erroneously-tagged images. We also analyze the use of implicit relevance feedback from the user’s gaze tracking patterns as a method for boosting up the CBIR performance. Finally, we use automatic speech recognition for giving the correct tags for those images that were wrongly tagged. The experiments show a large variance in the tagging task performance, which we believe is primarily caused by the users’ subjectivity in image contents as well as their varying familiarity with the gaze tracking and speech recognition setups. The results suggest potentials for gaze and/or speech enhanced CBIR method in image tagging, at least for some users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ames, M., Naaman, M.: Why we tag: motivations for annotation in mobile and online media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 971–980. ACM, New York (2007)
Auer, P., Hussain, Z., Kaski, S., Klami, A., Kujala, J., Laaksonen, J., Leung, A.P., Pasupa, K., Shawe-Taylor, J.: Pinview: Implicit feedback in content-based image retrieval. In: Diethe, T., Cristianini, N., Shawe-Taylor, J. (eds.) Proceedings of Workshop on Applications of Pattern Analysis. JMLR Workshop and Conference Proceedings, vol. 11, pp. 51–57 (2010)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys 40(2), 1–60 (2008)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007), http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Klami, A., Kaski, S., Pasupa, K., Saunders, C., de Campos, T.: Prediction of relevance of an image from a scan pattern. PinView FP7-216529 Project Deliverable Report D2.1 (December 2008), http://www.pinview.eu/deliverables.php
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences, vol. 30. Springer, Berlin (2001)
Laaksonen, J., Koskela, M., Oja, E.: PicSOM—Self-organizing image retrieval with MPEG-7 content descriptions. IEEE Transactions on Neural Networks, Special Issue on Intelligent Multimedia Processing 13(4), 841–853 (2002)
Lerman, K., Jones, L.: Social browsing on flickr. CoRR abs/cs/0612047 (2006)
Manning, C., Schütze, H.: MITCogNet: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)
Pylkkönen, J., Kurimo, M.: Duration modeling techniques for continuous speech recognition. In: Eighth International Conference on Spoken Language Processing, ISCA (2004)
Viitaniemi, V., Laaksonen, J.: Evaluating the performance in automatic image annotation: example case by adaptive fusion of global image features. Signal Processing: Image Communications 22(6), 557–568 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Ruokolainen, T., Laaksonen, J., Hochleitner, C., Traunmüller, R. (2011). Gaze- and Speech-Enhanced Content-Based Image Retrieval in Image Tagging. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6792. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21738-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-21738-8_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21737-1
Online ISBN: 978-3-642-21738-8
eBook Packages: Computer ScienceComputer Science (R0)