Abstract
We live in a richly visual world. More than one third of the entire human brain is involved in visual processing and understanding. Psychologists have shown that the human visual system is particularly efficient and effective in perceiving high-level meanings in cluttered real-world scenes, such as objects, scene classes, activities and the stories in the images. In this chapter, we discuss a generativemodel approach for classifying complex human activities (such as croquet game, snowboarding, etc.) given a single static image.We observe that object recognition in the scene as well as scene environment classification of the image facilitate each other in the overall activity recognition task. We formulate this observation in a graphical model representation where activity classification is achieved by combining information from both the object recognition and the scene classification pathways. For evaluating the robustness of our algorithm, we have assembled a challenging dataset consisting real-world images of eight different sport events, most of them collected from the Internet. Experimental results show that our hierarchical model performs better than existing methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fei-Fei, L., Iyer, A., Koch, C., Perona, P.: What do we perceive in a glance of a real-world scene? Journal of Vision 7(1),10, 1–29 (2007)
Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. In: Short Course of the International Conference on Computer Vision and Pattern Recognition (2007), http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of International Workshop on Content-based Access of Image and Vedeo Databases (1998)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (2001)
Vogel, J., Schiele, B.: A semantic typicality measure for natural scene categorization. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 195–203. Springer, Heidelberg (2004)
Fei-Fei, L., Perona, P.: A Bayesian hierarchy model for learning natural scene categories. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2005)
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 18–25 (2005)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511–518 (2001)
Zhang, H., Berg, A., Maire, M., Malik, J.: Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)
Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Proceedings of the International Conference on Computer Vision (2005)
Li, L.-J., Wang, G., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)
Wolfe, J.: Visual memory: what do you know about what you saw? Current Biology 8, R303–R304 (1998)
Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. In: Proceedings of ACM SIGGRAPH, vol. 24(3), pp. 577–584 (2005)
Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees:a graphical model relating features, objects and scenes. In: Proceedings of Neural Information Processing Systems (2004)
Hoiem, D., Efros, A., Hebert, M.: Putting Objects in Perspective. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)
Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the International Conference on Computer Vision (2005)
Tu, Z., Chen, X., Yuille, A., Zhu, S.: Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision 63(2), 113–140 (2005)
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision (1999)
Dorko, G., Schmid, C.: Object class recognition using discriminative local features. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted)
Obdrzalek, S., Matas, J.: Object recognition using local affine frames on distinguished regions. In: Proceedings of the British Machine Vision Conference, pp. 113–122 (2002)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Winn, J., Bishop, C.M.: Variational message passing. Journal of Machine Learning Research 6, 661–694 (2004)
Krempp, S., Geman, D., Amit, Y.: Sequential learning with reusable parts for object detection. Technical report, Johns Hopkins University (2002)
Yao, Z.-Y., Yang, X., Zhu, S.-C.: Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. In: Yuille, A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679, pp. 169–183. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fei-Fei, L., Li, LJ. (2010). What, Where and Who? Telling the Story of an Image by Activity Classification, Scene Recognition and Object Categorization. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-12848-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12847-9
Online ISBN: 978-3-642-12848-6
eBook Packages: EngineeringEngineering (R0)