Skip to main content

Actions in Still Web Images: Visualization, Detection and Retrieval

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Abstract

We describe a framework for human action retrieval in still web images by verb queries, for instance “phoning”. Firstly, we build a group of visual discriminative instances for each action class, called “Exemplarlets”. Thereafter we employ Multiple Kernel Learning (MKL) to learn an optimal combination of histogram intersection kernels, each of which captures a state-of-the-art feature channel. Our features include the distribution of edges, dense visual words and feature descriptors at different levels of spatial pyramid. For a new image we can detect the hot-region using a sliding-window detector learnt via MKL. The hot-region can imply latent actions in the image. After the hot-region has been detected, we build a inverted index in the visual search path, which we called Visual Inverted Index (VII). Finally, fusing the visual search path and the text search path, we can get the accurate results either relevant to text or to visual information. We show both the detection and retrieval results on our newly collected dataset of six actions as well as demonstrate improved performance over existing methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley Harlow, England (1999)

    Google Scholar 

  2. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  3. Chi, M., Zhang, P., Zhao, Y., Feng, R., Xue, X.: Web image retrieval reranking with multi-view clustering. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1189–1190. ACM, New York (2009)

    Chapter  Google Scholar 

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE, Los Alamitos (2005)

    Google Scholar 

  5. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference (2009)

    Google Scholar 

  6. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  7. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2009)

    Google Scholar 

  8. Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(10), 1775–1789 (2009)

    Article  Google Scholar 

  9. Ikizler, N., Cinbis, R., Pehlivan, S., Duygulu, P.: Recognizing actions from still images. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE, Los Alamitos (2009)

    Google Scholar 

  10. Ikizler-Cinbis, N., Cinbis, R., Sclaroff, S.: Learning actions from the web. In: IEEE 12th International Conference on Computer Vision, pp. 995–1002. IEEE, Los Alamitos (2010)

    Google Scholar 

  11. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Los Alamitos (2008)

    Google Scholar 

  12. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, Los Alamitos (2006)

    Google Scholar 

  13. Li, P., Zhang, L., Ma, J.: Dual-ranking for web image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 166–173. ACM, New York (2010)

    Chapter  Google Scholar 

  14. Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3), 90–126 (2006)

    Article  Google Scholar 

  15. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)

    Article  Google Scholar 

  16. Popescu, A., Moëllic, P., Kanellos, I., Landais, R.: Lightweight web image reranking. In: Proceedings of the seventeen ACM International Conference on Multimedia, pp. 657–660. ACM, New York (2009)

    Chapter  Google Scholar 

  17. Tian, X., Tao, D., Hua, X., Wu, X.: Active reranking for web image search. IEEE Transactions on Image Processing 19(3), 805–820 (2010)

    Article  MathSciNet  Google Scholar 

  18. van Leuken, R., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: Proceedings of the 18th International Conference on World Wide Web, pp. 341–350. ACM, New York (2009)

    Chapter  Google Scholar 

  19. Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off (2007)

    Google Scholar 

  20. Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 606–613. IEEE, Los Alamitos (2010)

    Google Scholar 

  21. Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2030–2037. IEEE, Los Alamitos (2010)

    Google Scholar 

  22. Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9–16. IEEE, Los Alamitos (2010)

    Google Scholar 

  23. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24. IEEE, Los Alamitos (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, P., Ma, J., Gao, S. (2011). Actions in Still Web Images: Visualization, Detection and Retrieval. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics