Skip to main content

Gaze- and Speech-Enhanced Content-Based Image Retrieval in Image Tagging

  • Conference paper
Artificial Neural Networks and Machine Learning – ICANN 2011 (ICANN 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6792))

Included in the following conference series:

Abstract

We describe a setup and experiments where users are checking and correcting image tags given by an automatic tagging system. We study how much the application of a content-based image retrieval (CBIR) method speeds up the process of finding and correcting the erroneously-tagged images. We also analyze the use of implicit relevance feedback from the user’s gaze tracking patterns as a method for boosting up the CBIR performance. Finally, we use automatic speech recognition for giving the correct tags for those images that were wrongly tagged. The experiments show a large variance in the tagging task performance, which we believe is primarily caused by the users’ subjectivity in image contents as well as their varying familiarity with the gaze tracking and speech recognition setups. The results suggest potentials for gaze and/or speech enhanced CBIR method in image tagging, at least for some users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ames, M., Naaman, M.: Why we tag: motivations for annotation in mobile and online media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 971–980. ACM, New York (2007)

    Chapter  Google Scholar 

  2. Auer, P., Hussain, Z., Kaski, S., Klami, A., Kujala, J., Laaksonen, J., Leung, A.P., Pasupa, K., Shawe-Taylor, J.: Pinview: Implicit feedback in content-based image retrieval. In: Diethe, T., Cristianini, N., Shawe-Taylor, J. (eds.) Proceedings of Workshop on Applications of Pattern Analysis. JMLR Workshop and Conference Proceedings, vol. 11, pp. 51–57 (2010)

    Google Scholar 

  3. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys 40(2), 1–60 (2008)

    Article  Google Scholar 

  4. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007), http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  5. Klami, A., Kaski, S., Pasupa, K., Saunders, C., de Campos, T.: Prediction of relevance of an image from a scan pattern. PinView FP7-216529 Project Deliverable Report D2.1 (December 2008), http://www.pinview.eu/deliverables.php

  6. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences, vol. 30. Springer, Berlin (2001)

    Book  Google Scholar 

  7. Laaksonen, J., Koskela, M., Oja, E.: PicSOM—Self-organizing image retrieval with MPEG-7 content descriptions. IEEE Transactions on Neural Networks, Special Issue on Intelligent Multimedia Processing 13(4), 841–853 (2002)

    Article  Google Scholar 

  8. Lerman, K., Jones, L.: Social browsing on flickr. CoRR abs/cs/0612047 (2006)

    Google Scholar 

  9. Manning, C., Schütze, H.: MITCogNet: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  10. Pylkkönen, J., Kurimo, M.: Duration modeling techniques for continuous speech recognition. In: Eighth International Conference on Spoken Language Processing, ISCA (2004)

    Google Scholar 

  11. Viitaniemi, V., Laaksonen, J.: Evaluating the performance in automatic image annotation: example case by adaptive fusion of global image features. Signal Processing: Image Communications 22(6), 557–568 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Ruokolainen, T., Laaksonen, J., Hochleitner, C., Traunmüller, R. (2011). Gaze- and Speech-Enhanced Content-Based Image Retrieval in Image Tagging. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6792. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21738-8_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21738-8_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21737-1

  • Online ISBN: 978-3-642-21738-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics