Skip to main content

Visual Search Target Inference Using Bag of Deep Visual Words

  • Conference paper
  • First Online:
KI 2018: Advances in Artificial Intelligence (KI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11117))

Abstract

Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset.

The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Amazon book cover dataset from Sattar et al. [15].

  2. 2.

    https://github.com/happynear/caffe-windows/tree/ms/models/bvlc_alexnet.

  3. 3.

    https://studio.azureml.net.

References

  1. Akkil, D., Isokoski, P.: Gaze augmentation in egocentric video improves awareness of intention. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 1573–1584. ACM Press (2016). http://dl.acm.org/citation.cfm?doid=2858036.2858127

  2. Bader, T., Beyerer, J.: Natural gaze behavior as input modality for human-computer interaction. In: Nakano, Y., Conati, C., Bader, T. (eds.) Eye Gaze in Intelligent User Interfaces, pp. 161–183. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4784-8_9

    Chapter  Google Scholar 

  3. Borji, A., Lennartz, A., Pomplun, M.: What do eyes reveal about the mind? Algorithmic inference of search targets from fixations. Neurocomputing 149(PB), 788–799 (2015). https://doi.org/10.1016/j.neucom.2014.07.055

    Article  Google Scholar 

  4. DeAngelus, M., Pelz, J.B.: Top-down control of eye movements: Yarbus revisited. Vis. Cognit. 17(6–7), 790–811 (2009). https://doi.org/10.1080/13506280902793843

    Article  Google Scholar 

  5. Donahue, J., et al.: DeCAF: A deep convolutional activation feature for generic visual recognition. In: Icml, vol. 32, pp. 647–655 (2014). http://arxiv.org/abs/1310.1531

  6. Flanagan, J.R., Johansson, R.S.: Action plans used in action observation. Nature 424(6950), 769–771 (2003). http://www.nature.com/doifinder/10.1038/nature01861

    Article  Google Scholar 

  7. Goldberg, Y.: Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)

    Article  MathSciNet  Google Scholar 

  8. Gredeback, G., Falck-Ytter, T.: Eye movements during action observation. Perspect. Psychol. Sci. 10(5), 591–598 (2015). http://pps.sagepub.com/lookup/doi/10.1177/1745691615589103

    Article  Google Scholar 

  9. Huang, C.M., Mutlu, B.: Anticipatory robot control for efficient human-robot collaboration. In: 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 83–90. IEEE, March 2016. https://doi.org/10.1109/HRI.2016.7451737, http://ieeexplore.ieee.org/document/7451737/

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2012, pp. 1097–1105. Curran Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257

  11. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410, http://ieeexplore.ieee.org/document/790410/

  12. Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014). https://doi.org/10.1109/CVPRW.2014.131, http://arxiv.org/abs/1403.6382

  13. Rotman, G., Troje, N.F., Johansson, R.S., Flanagan, J.R.: Eye movements when observing predictable and unpredictable actions. J. Neurophysiol. 96(3), 1358–1369 (2006). https://doi.org/10.1152/jn.00227.2006. http://www.ncbi.nlm.nih.gov/pubmed/16687620

    Article  Google Scholar 

  14. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  15. Sattar, H., Müller, S., Fritz, M., Bulling, A.: Prediction of search targets from fixations in open-world settings. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 981–990, June 2015. https://doi.org/10.1109/CVPR.2015.7298700

  16. Sattar, H., Bulling, A., Fritz, M.: Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling (2016). http://arxiv.org/abs/1611.10162

  17. Sonntag, D.: Kognit: intelligent cognitive enhancement technology by cognitive models and mixed reality for dementia patients. In: AAAI Fall Symposium Series (2015). https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11702

  18. Sonntag, D.: Intelligent user interfaces - A tutorial. CoRR abs/1702.05250 (2017). http://arxiv.org/abs/1702.05250

  19. Toyama, T., Sonntag, D.: Towards episodic memory support for dementia patients by recognizing objects, faces and text in eye gaze. In: Hölldobler, S., Krötzsch, M., Peñaloza, R., Rudolph, S. (eds.) KI 2015. LNCS (LNAI), vol. 9324, pp. 316–323. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24489-1_29

    Chapter  Google Scholar 

  20. Wolfe, J.M.: Guided search 2.0 a revised model of visual search. Psychon. Bull. Rev. 1(2), 202–238 (1994). https://doi.org/10.3758/BF03200774

    Article  Google Scholar 

  21. Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 197–206. ACM, New York (2007). http://doi.acm.org/10.1145/1290082.1290111

  22. Yarbus, A.L.: Eye movements and vision. Neuropsychologia 6(4), 222 (1967). https://doi.org/10.1016/0028-3932(68)90012-2

    Article  Google Scholar 

  23. Zelinsky, G.J., Peng, Y., Samaras, D.: Eye can read your mind: decoding gaze fixations to reveal categorical search targets. J. Vis. 13(14), 10 (2013). https://doi.org/10.1167/13.14.10. http://www.ncbi.nlm.nih.gov/pubmed/24338446

    Article  Google Scholar 

Download references

Acknowledgement

This work was funded by the Federal Ministry of Education and Research (BMBF) under grant number 16SV7768 in the Interakt project.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sven Stauden , Michael Barz or Daniel Sonntag .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stauden, S., Barz, M., Sonntag, D. (2018). Visual Search Target Inference Using Bag of Deep Visual Words. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00111-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00110-0

  • Online ISBN: 978-3-030-00111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics