Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries

Hoai, Minh; Zisserman, Andrew

doi:10.1007/978-3-319-16817-3_15

Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries

Minh Hoai^17,18 &
Andrew Zisserman¹⁷

Conference paper
First Online: 01 January 2015

2339 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9006))

Abstract

We study the task of recognizing human actions in video whilst paying attention to the shot and thread editing structure. Most existing action recognition algorithms ignore this structure, but it is generally present in edited TV and film material.

To this end, we make the following contributions: first, we introduce a new dataset of human actions to study the occurrence/reoccurrence of patterns of human actions in edited TV material; second, we propose composing a video into threads of related shots, removing some of the discontinuities due to shot boundaries; and third, we show the benefits of utilizing video threads in recognizing human actions. The experiments demonstrate that human action retrieval accuracy can be improved using threads.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (2011)
Google Scholar
Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in TV shows. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2441–2453 (2012)
Article Google Scholar
Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the International Conference on Computer Vision (2013)
Google Scholar
Hoai, M., Zisserman, A.: Talking heads: detecting humans and recognizing their interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/Script: alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 158–171. Springer, Heidelberg (2008)
Chapter Google Scholar
Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Trans. Multimed. 8, 686–697 (2006)
Article Google Scholar
Yeung, M., Yeo, B.L., Liu, B.: Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst. 71, 94–109 (1998)
Article Google Scholar
Kender, J., Yeo, B.L.: Video scene segmentation via continuous video coherence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1998)
Google Scholar
Chasanis, V.T., Likas, A.C., Galatsanos, N.P.: Scene detection in videos using shot clustering and sequence alignment. IEEE Trans. Multimed. 11, 89–100 (2009)
Article Google Scholar
Lehane, B., O’Connor, N.E., Murphy, N.: Dialogue sequence detection in movies. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 286–296. Springer, Heidelberg (2005)
Chapter Google Scholar
Lehane, B., O’Connor, N.E., Smeaton, A.F., Lee, H.: A system for event-based film browsing. In: Göbel, S., Malkewitz, R., Iurgel, I. (eds.) TIDSE 2006. LNCS, vol. 4326, pp. 334–345. Springer, Heidelberg (2006)
Chapter Google Scholar
Pickup, L., Zisserman, A.: Automatic retrieval of visual continuity errors in movies. In: ACM International Conference on Image and Video Retrieval (2009)
Google Scholar
Tapaswi, M., Bauml, M., Stiefelhagen, R.: Storygraphs: visualizing character interactions as a timeline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the International Conference on Pattern Recognition (2004)
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007)
Article Google Scholar
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24, 971–981 (2012)
Article Google Scholar
Everingham, M., Sivic, J., Zisserman, A.: “hello! my name is ... Buffy” - automatic naming of characters in tv video. In: Proceedings of the British Machine Vision Conference (2006)
Google Scholar
Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. J. Electron. Imaging 5, 122–128 (1996)
Article Google Scholar
Lienhart, R.: Comparison of automatic shot boundary detection algorithms. In: SPIE, vol. 3656 (1998)
Google Scholar
Lienhart, R.: Reliable transition detection in videos: a survey and practitioner’s guide. Int. J. Image Graph. 1, 469–486 (2001)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 3–20. Springer, Heidelberg (2014)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300 (1999)
Article MathSciNet Google Scholar
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the International Conference on Machine Learning (1998)
Google Scholar
Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., DeMoor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Book MATH Google Scholar
Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of the British Machine Vision Conference (2009)
Google Scholar
Hoai, M.: Regularized max pooling for image categorization. In: Proceedings of the British Machine Vision Conference (2014)
Google Scholar
Cawley, G.C., Talbot, N.L.: Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw. 17, 1467–1475 (2004)
Article MATH Google Scholar
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 84–97. Springer, Heidelberg (2012)
Chapter Google Scholar
Marin-Jimenez, M.J., Yeguas, E., de la Blanca, N.P.: Exploring STIP-based models for recognizing human interactions in TV videos. PRL 34, 1819–1828 (2013)
Article Google Scholar
Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)
Chapter Google Scholar
Mathe, S., Sminchisescu, C.: Dynamic eye movement datasets and learnt saliency models for visual action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 842–856. Springer, Heidelberg (2012)
Chapter Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: Recognizing activities with cluster-trees of tracklets. In: Proceedings of the British Machine Vision Conference (2012)
Google Scholar

Download references

Acknowledgements

This work was supported by the EPSRC grant EP/I012001/1 and a Royal Society Wolfson Research Merit Award.

Author information

Authors and Affiliations

Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, OX1 3PJ, UK
Minh Hoai & Andrew Zisserman
Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
Minh Hoai

Authors

Minh Hoai
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minh Hoai .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoai, M., Zisserman, A. (2015). Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9006. Springer, Cham. https://doi.org/10.1007/978-3-319-16817-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-16817-3_15
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16816-6
Online ISBN: 978-3-319-16817-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics