What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization

Fei-Fei, Li; Li, Li-Jia

doi:10.1007/978-3-642-12848-6_6

Li Fei-Fei⁴ &
Li-Jia Li⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 285))

4065 Accesses

Abstract

We live in a richly visual world. More than one third of the entire human brain is involved in visual processing and understanding. Psychologists have shown that the human visual system is particularly efficient and effective in perceiving high-level meanings in cluttered real-world scenes, such as objects, scene classes, activities and the stories in the images. In this chapter, we discuss a generativemodel approach for classifying complex human activities (such as croquet game, snowboarding, etc.) given a single static image.We observe that object recognition in the scene as well as scene environment classification of the image facilitate each other in the overall activity recognition task. We formulate this observation in a graphical model representation where activity classification is achieved by combining information from both the object recognition and the scene classification pathways. For evaluating the robustness of our algorithm, we have assembled a challenging dataset consisting real-world images of eight different sport events, most of them collected from the Internet. Experimental results show that our hierarchical model performs better than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A comprehensive system for image scene classification

Article 26 February 2020

Can computer vision problems benefit from structured hierarchical classification?

Article Open access 06 May 2016

A Study on Vision-Based Human Activity Recognition Approaches

References

Fei-Fei, L., Iyer, A., Koch, C., Perona, P.: What do we perceive in a glance of a real-world scene? Journal of Vision 7(1),10, 1–29 (2007)
Article Google Scholar
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. In: Short Course of the International Conference on Computer Vision and Pattern Recognition (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of International Workshop on Content-based Access of Image and Vedeo Databases (1998)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (2001)
Google Scholar
Vogel, J., Schiele, B.: A semantic typicality measure for natural scene categorization. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 195–203. Springer, Heidelberg (2004)
Google Scholar
Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2005)
Google Scholar
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)
Chapter Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)
Google Scholar
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 18–25 (2005)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511–518 (2001)
Google Scholar
Zhang, H., Berg, A., Maire, M., Malik, J.: Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)
Google Scholar
Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Proceedings of the International Conference on Computer Vision (2005)
Google Scholar
Li, L.-J., Wang, G., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Wolfe, J.: Visual memory: what do you know about what you saw? Current Biology 8, R303–R304 (1998)
Article Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH, vol. 24(3), pp. 577–584 (2005)
Google Scholar
Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees:a graphical model relating features, objects and scenes. In: Proceedings of Neural Information Processing Systems (2004)
Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Putting Objects in Perspective. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the International Conference on Computer Vision (2005)
Google Scholar
Tu, Z., Chen, X., Yuille, A., Zhu, S.: Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision 63(2), 113–140 (2005)
Article Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (1999)
Google Scholar
Dorko, G., Schmid, C.: Object class recognition using discriminative local features. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted)
Google Scholar
Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proceedings of the British Machine Vision Conference, pp. 113–122 (2002)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Winn, J., Bishop, C.M.: Variational message passing. Journal of Machine Learning Research 6, 661–694 (2004)
MathSciNet Google Scholar
Krempp, S., Geman, D., Amit, Y.: Sequential learning with reusable parts for object detection. Technical report, Johns Hopkins University (2002)
Google Scholar
Yao, Z.-Y., Yang, X., Zhu, S.-C.: Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. In: Yuille, A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679, pp. 169–183. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Stanford University, USA
Li Fei-Fei & Li-Jia Li

Authors

Li Fei-Fei
View author publications
You can also search for this author in PubMed Google Scholar
Li-Jia Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering, University of Cambridge, CB2 1PZ, Cambridge, UK
Roberto Cipolla
Dipartimento di Matematica ed Informatica, University of Catania, Viale A. Doria 6, I, 95125, Catania, Italy
Sebastiano Battiato & Giovanni Maria Farinella &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fei-Fei, L., Li, LJ. (2010). What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-12848-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12847-9
Online ISBN: 978-3-642-12848-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization

Abstract

Access this chapter

Preview

Similar content being viewed by others

A comprehensive system for image scene classification

Can computer vision problems benefit from structured hierarchical classification?

A Study on Vision-Based Human Activity Recognition Approaches

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization

Abstract

Access this chapter

Preview

Similar content being viewed by others

A comprehensive system for image scene classification

Can computer vision problems benefit from structured hierarchical classification?

A Study on Vision-Based Human Activity Recognition Approaches

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation