Skip to main content

What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization

  • Chapter
Computer Vision

Part of the book series: Studies in Computational Intelligence ((SCI,volume 285))

  • 4065 Accesses

Abstract

We live in a richly visual world. More than one third of the entire human brain is involved in visual processing and understanding. Psychologists have shown that the human visual system is particularly efficient and effective in perceiving high-level meanings in cluttered real-world scenes, such as objects, scene classes, activities and the stories in the images. In this chapter, we discuss a generativemodel approach for classifying complex human activities (such as croquet game, snowboarding, etc.) given a single static image.We observe that object recognition in the scene as well as scene environment classification of the image facilitate each other in the overall activity recognition task. We formulate this observation in a graphical model representation where activity classification is achieved by combining information from both the object recognition and the scene classification pathways. For evaluating the robustness of our algorithm, we have assembled a challenging dataset consisting real-world images of eight different sport events, most of them collected from the Internet. Experimental results show that our hierarchical model performs better than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Fei-Fei, L., Iyer, A., Koch, C., Perona, P.: What do we perceive in a glance of a real-world scene? Journal of Vision 7(1),10, 1–29 (2007)

    Article  Google Scholar 

  2. Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. In: Short Course of the International Conference on Computer Vision and Pattern Recognition (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html

  3. Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of International Workshop on Content-based Access of Image and Vedeo Databases (1998)

    Google Scholar 

  4. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (2001)

    Google Scholar 

  5. Vogel, J., Schiele, B.: A semantic typicality measure for natural scene categorization. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 195–203. Springer, Heidelberg (2004)

    Google Scholar 

  6. Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  7. Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  8. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)

    Google Scholar 

  9. Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 18–25 (2005)

    Google Scholar 

  10. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511–518 (2001)

    Google Scholar 

  11. Zhang, H., Berg, A., Maire, M., Malik, J.: Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  12. Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)

    Google Scholar 

  13. Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Proceedings of the International Conference on Computer Vision (2005)

    Google Scholar 

  14. Li, L.-J., Wang, G., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  15. Wolfe, J.: Visual memory: what do you know about what you saw? Current Biology 8, R303–R304 (1998)

    Article  Google Scholar 

  16. Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH, vol. 24(3), pp. 577–584 (2005)

    Google Scholar 

  17. Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees:a graphical model relating features, objects and scenes. In: Proceedings of Neural Information Processing Systems (2004)

    Google Scholar 

  18. Hoiem, D., Efros, A., Hebert, M.: Putting Objects in Perspective. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  19. Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the International Conference on Computer Vision (2005)

    Google Scholar 

  20. Tu, Z., Chen, X., Yuille, A., Zhu, S.: Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision 63(2), 113–140 (2005)

    Article  Google Scholar 

  21. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (1999)

    Google Scholar 

  22. Dorko, G., Schmid, C.: Object class recognition using discriminative local features. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted)

    Google Scholar 

  23. Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proceedings of the British Machine Vision Conference, pp. 113–122 (2002)

    Google Scholar 

  24. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    Article  MATH  Google Scholar 

  25. Winn, J., Bishop, C.M.: Variational message passing. Journal of Machine Learning Research 6, 661–694 (2004)

    MathSciNet  Google Scholar 

  26. Krempp, S., Geman, D., Amit, Y.: Sequential learning with reusable parts for object detection. Technical report, Johns Hopkins University (2002)

    Google Scholar 

  27. Yao, Z.-Y., Yang, X., Zhu, S.-C.: Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. In: Yuille, A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679, pp. 169–183. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fei-Fei, L., Li, LJ. (2010). What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12848-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12847-9

  • Online ISBN: 978-3-642-12848-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics