Fusion of Multiple Visual Cues for Object Recognition in Videos

González-Díaz, Iván; Benois-Pineau, Jenny; Buso, Vincent; Boujut, Hugo

doi:10.1007/978-3-319-05696-8_4

Iván González-Díaz⁷,
Jenny Benois-Pineau⁷,
Vincent Buso⁷ &
…
Hugo Boujut⁷

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1756 Accesses
3 Citations

Abstract

In this chapter, we are interested in the open problem of meaningful object recognition in video. Recently the approaches which estimate human visual attention and incorporate it into the whole visual content understanding process have become popular. In estimation of visual attention in a complex spatio-temporal content such as video one has to fuse multiple information channels such as motion, spatial contrast, and others. In the first part of the chapter, we are interested in these questions and report on optimal strategies of bottom–up fusion in visual saliency estimation. Then the estimated visual saliency is used in pooling of local descriptors. We compare different pooling approaches and show results on rather interesting visual content: that one recorded with wearable cameras for a large-scale research on Alzheimer’s disease. The results which will be shown together with conclusion demonstrate that the approaches based on the saliency fusion outperform the best state-of-the art techniques in this content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2847–2854
Google Scholar
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), IEEE Computer Society, Anchorage, 24–26 June 2008
Google Scholar
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203
Google Scholar
Fathi A, Li Y, Rehg JM (2012) Learning to recognize daily actions using gaze. In: Proceedings of the 12th European conference on computer vision—Volume Part I, ECCV’12, pp 314–327, Springer, Berlin, 2012
Google Scholar
Ogaki K, Kitani KM, Sugano Y, Sato Y (2012) Coupling eye-motion and ego-motion features for first-person activity recognition. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 1–7, 2012
Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Schmid C, Soatto S, Tomasi C (eds) International conference on computer vision and pattern recognition, vol 2. INRIA Rhône-Alpes, ZIRST-655, av. de l’Europe, Montbonnot-38334, pp 886–893
Google Scholar
Jing F, Li M, Zhang H, Zhang B (2002) An effective region-based image retrieval framework. In: ACM international conference on multimedia, 2002
Google Scholar
Long F, Zhang H, Feng D (2003) Fundamentals of content-based image retrieval. In: Multimedia information retrieval and management, 2003
Google Scholar
Manjunath B, Ohm J, Vasudevan V, Yamada A (2001) Colour and texture descriptors. IEEE Trans Circ Sys Video Technol 11(6):703–715
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Intern J Comput Vis 60:91–110
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110:346–359
Google Scholar
Mokhtarian F, Suomela R (1998) Robust image corner detection through curvature scale space. IEEE Trans Pattern Anal Mach Intell 20(12):1376–1381
Article Google Scholar
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision 2:1470–1477
Google Scholar
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: 2001 IEEE computer society conference on computer vision and pattern recognition, vol 1. IEEE, Los Alamitos, pp 511–518
Google Scholar
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, 2012
Google Scholar
de Carvalho Soares R, da Silva I, Guliato D (2012) Spatial locality weighting of features using saliency map with a bag-of-visual-words approach. In: IEEE 24th international conference on tools with artificial intelligence (ICTAI), vol 1. pp 1070–1075
Google Scholar
Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3506–3513
Google Scholar
Vig E, Dorr M, Cox D (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. Springer, Firenze, pp 84–97
Google Scholar
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136
Google Scholar
Borji A, Itti L (2012) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 99 (PrePrints), 34(9):1758–1772
Google Scholar
Vig E, Dorr M, Cox D (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: European conference on computer vision, 2012
Google Scholar
Tatler BW (2007) The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. J Vis 7(14):1–17
Article Google Scholar
Dorr M, Martinetz T, Gegenfurtner KR, Barth E (2010) Variability of eye movements when viewing dynamic natural scenes. J Vis, 10(10):28
Google Scholar
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4:219–227
Google Scholar
Posner MI, Cohen YA (1984) Components of visual orienting. In: Bouma H, Bouwhuis DG (eds) Attention and performance X: control of language processes. Lawrence Erlbaum, Hillsdale
Google Scholar
Parkhurst D, Law K, Niebur E (2002) Modeling the role of salience in the allocation of overt visual attention. Vis Res 42(1):107–123
Article Google Scholar
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems 19. MIT Press, Cambridge, pp 545–552
Google Scholar
Marat S, Ho Phuoc T, Granjon L, Guyader N, Pellerin D, Guérin-Dugué, V (2009) Modelling spatio-temporal saliency to predict gaze direction for short videos. Intern J Comput Vis 82(3):231–243
Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Google Scholar
Itti L, Baldi PF (2006) Bayesian surprise attracts human attention. In: Advances in neural information processing systems, (NIPS*2005) vol 19. MIT Press, Cambridge, pp 547–554
Google Scholar
Tsotsos JK, Bruce NDB (2006) Saliency based on information maximization. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, pp 155–162
Google Scholar
Itti L, Braun J, Lee DK, Koch C (1999) Attentional modulation of human pattern discrimination psychophysics reproduced by a quantitative model. In: Advances in neural information processing systems. MIT Press, Cambridge, p 1998
Google Scholar
Itti L (June 2000) A saliency-based search mechanism for overt and covert shifts of visual attention. Vis Res 40(10–12):1489–1506
Google Scholar
Lee DK, Itti L, Koch C, Braun J (Apr 1999) Attention activates winner-take-all competition among visual filters. Nat Neurosci 2(4):375–81
Google Scholar
Brouard O, Ricordel V, Barba D (2009) Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif. In: Compression et representation des signaux audiovisuels, 2009
Google Scholar
Farnebäck G (2000) Fast and accurate motion estimation using orientation tensors and parametric motion models. In: Proceedings of 15th international conference on pattern recognition, vol 1. IAPR, Barcelona, Sept 2000, pp 135–139
Google Scholar
Fischler MA, Bolles RC (June 1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395
Google Scholar
Daly SJ (1998) Engineering observations from spatiovelocity and spatiotemporal visual models. In: IS&T/SPIE conference on human vision and electronic imaging III:1, 1998
Google Scholar
Boujut H, Benois-Pineau J, Megret R (2012) Fusion of multiple visual cues for visual saliency extraction from wearable camera settings with strong motion. In: Fusiello A, Murino V, Cucchiara R (eds) Computer vision—ECCV 2012. Workshops and Demonstrations, Lecture Notes in Computer Science, vol 7585. Springer, Berlin, pp 436–445
Google Scholar
Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28:1311–1328
Article Google Scholar
Moré JJ, Sorensen DC (1983) Computing a trust region step. SIAM J Sci Stat Comput 4(3):553–572
Google Scholar
Boujut H, Benois-Pineau J, Ahmed T, Hadar O, Bonnet P (2011) A metric for no-reference video quality assessment for hd tv delivery based on saliency maps. In: IEEE international conference on multimedia and expo, July 2011
Google Scholar
Tuytelaars T, Lampert C, Blaschko M, Buntine W (2010) Unsupervised object discovery: a comparison. Intern J Comput Vis 88:284–302
Article Google Scholar
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, pp 1–8, June 2008
Google Scholar
Marszałek M, Schmid C (2006) Spatial weighting for bag-of-features. In: IEEE conference on computer vision and pattern recognition, vol 2. pp 2118–2125
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
MATH Google Scholar
Sreekanth V, Vedaldi A, Jawahar CV, Zisserman A (2010) Generalized RBF feature maps for efficient detection. In: Proceedings of the British machine vision conference (BMVC), 2010
Google Scholar
Fathi A, Ren X, Rehg JM (2011) Learning to recognize objects in egocentric activities. In: The 24th IEEE conference on computer vision and pattern recognition, CVPR 2011, IEEE, Colorado Springs, 20–25 June 2011, pp 3281–3288
Google Scholar
Over P, Awad G, Michel M, Fiscus J, Sanders G, Shaw B, Kraaij W, Smeaton AF, Quéenot G (2012) Trecvid 2012—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2012, NIST, USA, 2012
Google Scholar

Download references

Acknowledgments

This research has been supported by the region of Aquitaine and the European Community’s program (FP7/2007–2014) under Grant Agreement 288199 (Dem@care Project).

Author information

Authors and Affiliations

Laboratoire Bordelais de Recherches en Informatique, LaBRI, 351, Cours de la Libération, 33405, Talence, France
Iván González-Díaz, Jenny Benois-Pineau, Vincent Buso & Hugo Boujut

Authors

Iván González-Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Benois-Pineau
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Buso
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Boujut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jenny Benois-Pineau .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Romania
Bogdan Ionescu
University of Bordeaux, Talence, France
Jenny Benois-Pineau
Queen Mary University of London, London, United Kingdom
Tomas Piatrik
Lab. of Informatics of Grenoble, France
Georges Quénot

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

González-Díaz, I., Benois-Pineau, J., Buso, V., Boujut, H. (2014). Fusion of Multiple Visual Cues for Object Recognition in Videos. In: Ionescu, B., Benois-Pineau, J., Piatrik, T., Quénot, G. (eds) Fusion in Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-05696-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-05696-8_4
Published: 26 March 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05695-1
Online ISBN: 978-3-319-05696-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics