Actions in Still Web Images: Visualization, Detection and Retrieval

Li, Piji; Ma, Jun; Gao, Shuai

doi:10.1007/978-3-642-23535-1_27

Actions in Still Web Images: Visualization, Detection and Retrieval

Piji Li²¹,
Jun Ma²¹ &
Shuai Gao²¹

Conference paper

1873 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Abstract

We describe a framework for human action retrieval in still web images by verb queries, for instance “phoning”. Firstly, we build a group of visual discriminative instances for each action class, called “Exemplarlets”. Thereafter we employ Multiple Kernel Learning (MKL) to learn an optimal combination of histogram intersection kernels, each of which captures a state-of-the-art feature channel. Our features include the distribution of edges, dense visual words and feature descriptors at different levels of spatial pyramid. For a new image we can detect the hot-region using a sliding-window detector learnt via MKL. The hot-region can imply latent actions in the image. After the hot-region has been detected, we build a inverted index in the visual search path, which we called Visual Inverted Index (VII). Finally, fusing the visual search path and the text search path, we can get the accurate results either relevant to text or to visual information. We show both the detection and retrieval results on our newly collected dataset of six actions as well as demonstrate improved performance over existing methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley Harlow, England (1999)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chi, M., Zhang, P., Zhao, Y., Feng, R., Xue, X.: Web image retrieval reranking with multi-view clustering. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1189–1190. ACM, New York (2009)
Chapter Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE, Los Alamitos (2005)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Article Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2009)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(10), 1775–1789 (2009)
Article Google Scholar
Ikizler, N., Cinbis, R., Pehlivan, S., Duygulu, P.: Recognizing actions from still images. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE, Los Alamitos (2009)
Google Scholar
Ikizler-Cinbis, N., Cinbis, R., Sclaroff, S.: Learning actions from the web. In: IEEE 12th International Conference on Computer Vision, pp. 995–1002. IEEE, Los Alamitos (2010)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Los Alamitos (2008)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, Los Alamitos (2006)
Google Scholar
Li, P., Zhang, L., Ma, J.: Dual-ranking for web image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 166–173. ACM, New York (2010)
Chapter Google Scholar
Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3), 90–126 (2006)
Article Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)
Article Google Scholar
Popescu, A., Moëllic, P., Kanellos, I., Landais, R.: Lightweight web image reranking. In: Proceedings of the seventeen ACM International Conference on Multimedia, pp. 657–660. ACM, New York (2009)
Chapter Google Scholar
Tian, X., Tao, D., Hua, X., Wu, X.: Active reranking for web image search. IEEE Transactions on Image Processing 19(3), 805–820 (2010)
Article MathSciNet Google Scholar
van Leuken, R., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: Proceedings of the 18th International Conference on World Wide Web, pp. 341–350. ACM, New York (2009)
Chapter Google Scholar
Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off (2007)
Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 606–613. IEEE, Los Alamitos (2010)
Google Scholar
Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2030–2037. IEEE, Los Alamitos (2010)
Google Scholar
Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9–16. IEEE, Los Alamitos (2010)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24. IEEE, Los Alamitos (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Technology, Shandong University, Jinan, 250101, China
Piji Li, Jun Ma & Shuai Gao

Authors

Piji Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Asia, 5 Danling Rd., Haidian District, 100190, Beijing, China
Haixun Wang
Computer School, Wuhan University, 16 Luojiashan Road, 430072, Hubei, China
Shijun Li
Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, 060-0814, Hokkaido, Sapporo, Japan
Satoshi Oyama
College of Information Science and Technology, Drexel University, 19104, Philadelphia, PA, USA
Xiaohua Hu
State Key Laboratory of Software Engineering, Wuhan University, 16 Luojiashan Road, 430072, Wuhan, Hubei, China
Tieyun Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, P., Ma, J., Gao, S. (2011). Actions in Still Web Images: Visualization, Detection and Retrieval. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-23535-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics