Visual Search Target Inference Using Bag of Deep Visual Words

Stauden, Sven; Barz, Michael; Sonntag, Daniel

doi:10.1007/978-3-030-00111-7_25

Sven Stauden¹⁵,
Michael Barz¹⁵ &
Daniel Sonntag¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11117))

Included in the following conference series:

Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz)

Abstract

Visual Search target inference subsumes methods for predicting the target object through eye tracking. A person intents to find an object in a visual scene which we predict based on the fixation behavior. Knowing about the search target can improve intelligent user interaction. In this work, we implement a new feature encoding, the Bag of Deep Visual Words, for search target inference using a pre-trained convolutional neural network (CNN). Our work is based on a recent approach from the literature that uses Bag of Visual Words, common in computer vision applications. We evaluate our method using a gold standard dataset.

The results show that our new feature encoding outperforms the baseline from the literature, in particular, when excluding fixations on the target.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Visual attention network

Article Open access 28 July 2023

Learning to Search for Objects in Images from Human Gaze Sequences

NASformer: Neural Architecture Search for Vision Transformer

Notes

1.
The Amazon book cover dataset from Sattar et al. [15].
2.
https://github.com/happynear/caffe-windows/tree/ms/models/bvlc_alexnet.
3.
https://studio.azureml.net.

References

Akkil, D., Isokoski, P.: Gaze augmentation in egocentric video improves awareness of intention. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 1573–1584. ACM Press (2016). http://dl.acm.org/citation.cfm?doid=2858036.2858127
Bader, T., Beyerer, J.: Natural gaze behavior as input modality for human-computer interaction. In: Nakano, Y., Conati, C., Bader, T. (eds.) Eye Gaze in Intelligent User Interfaces, pp. 161–183. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4784-8_9
Chapter Google Scholar
Borji, A., Lennartz, A., Pomplun, M.: What do eyes reveal about the mind? Algorithmic inference of search targets from fixations. Neurocomputing 149(PB), 788–799 (2015). https://doi.org/10.1016/j.neucom.2014.07.055
Article Google Scholar
DeAngelus, M., Pelz, J.B.: Top-down control of eye movements: Yarbus revisited. Vis. Cognit. 17(6–7), 790–811 (2009). https://doi.org/10.1080/13506280902793843
Article Google Scholar
Donahue, J., et al.: DeCAF: A deep convolutional activation feature for generic visual recognition. In: Icml, vol. 32, pp. 647–655 (2014). http://arxiv.org/abs/1310.1531
Flanagan, J.R., Johansson, R.S.: Action plans used in action observation. Nature 424(6950), 769–771 (2003). http://www.nature.com/doifinder/10.1038/nature01861
Article Google Scholar
Goldberg, Y.: Neural network methods for natural language processing. Synth. Lect. Hum. Lang. Technol. 10(1), 1–309 (2017)
Article MathSciNet Google Scholar
Gredeback, G., Falck-Ytter, T.: Eye movements during action observation. Perspect. Psychol. Sci. 10(5), 591–598 (2015). http://pps.sagepub.com/lookup/doi/10.1177/1745691615589103
Article Google Scholar
Huang, C.M., Mutlu, B.: Anticipatory robot control for efficient human-robot collaboration. In: 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 83–90. IEEE, March 2016. https://doi.org/10.1109/HRI.2016.7451737, http://ieeexplore.ieee.org/document/7451737/
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2012, pp. 1097–1105. Curran Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410, http://ieeexplore.ieee.org/document/790410/
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014). https://doi.org/10.1109/CVPRW.2014.131, http://arxiv.org/abs/1403.6382
Rotman, G., Troje, N.F., Johansson, R.S., Flanagan, J.R.: Eye movements when observing predictable and unpredictable actions. J. Neurophysiol. 96(3), 1358–1369 (2006). https://doi.org/10.1152/jn.00227.2006. http://www.ncbi.nlm.nih.gov/pubmed/16687620
Article Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sattar, H., Müller, S., Fritz, M., Bulling, A.: Prediction of search targets from fixations in open-world settings. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 981–990, June 2015. https://doi.org/10.1109/CVPR.2015.7298700
Sattar, H., Bulling, A., Fritz, M.: Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling (2016). http://arxiv.org/abs/1611.10162
Sonntag, D.: Kognit: intelligent cognitive enhancement technology by cognitive models and mixed reality for dementia patients. In: AAAI Fall Symposium Series (2015). https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11702
Sonntag, D.: Intelligent user interfaces - A tutorial. CoRR abs/1702.05250 (2017). http://arxiv.org/abs/1702.05250
Toyama, T., Sonntag, D.: Towards episodic memory support for dementia patients by recognizing objects, faces and text in eye gaze. In: Hölldobler, S., Krötzsch, M., Peñaloza, R., Rudolph, S. (eds.) KI 2015. LNCS (LNAI), vol. 9324, pp. 316–323. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24489-1_29
Chapter Google Scholar
Wolfe, J.M.: Guided search 2.0 a revised model of visual search. Psychon. Bull. Rev. 1(2), 202–238 (1994). https://doi.org/10.3758/BF03200774
Article Google Scholar
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, MIR 2007, pp. 197–206. ACM, New York (2007). http://doi.acm.org/10.1145/1290082.1290111
Yarbus, A.L.: Eye movements and vision. Neuropsychologia 6(4), 222 (1967). https://doi.org/10.1016/0028-3932(68)90012-2
Article Google Scholar
Zelinsky, G.J., Peng, Y., Samaras, D.: Eye can read your mind: decoding gaze fixations to reveal categorical search targets. J. Vis. 13(14), 10 (2013). https://doi.org/10.1167/13.14.10. http://www.ncbi.nlm.nih.gov/pubmed/24338446
Article Google Scholar

Download references

Acknowledgement

This work was funded by the Federal Ministry of Education and Research (BMBF) under grant number 16SV7768 in the Interakt project.

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany
Sven Stauden, Michael Barz & Daniel Sonntag

Authors

Sven Stauden
View author publications
You can also search for this author in PubMed Google Scholar
Michael Barz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Sonntag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sven Stauden , Michael Barz or Daniel Sonntag .

Editor information

Editors and Affiliations

TU Berlin, Berlin, Germany
Frank Trollmann
TU Dresden, Dresden, Germany
Anni-Yasmin Turhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stauden, S., Barz, M., Sonntag, D. (2018). Visual Search Target Inference Using Bag of Deep Visual Words. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-00111-7_25
Published: 30 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00110-0
Online ISBN: 978-3-030-00111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Visual Search Target Inference Using Bag of Deep Visual Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Visual attention network

Learning to Search for Objects in Images from Human Gaze Sequences

NASformer: Neural Architecture Search for Vision Transformer

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Visual Search Target Inference Using Bag of Deep Visual Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Visual attention network

Learning to Search for Objects in Images from Human Gaze Sequences

NASformer: Neural Architecture Search for Vision Transformer

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation