Abstract
We study the task of recognizing human actions in video whilst paying attention to the shot and thread editing structure. Most existing action recognition algorithms ignore this structure, but it is generally present in edited TV and film material.
To this end, we make the following contributions: first, we introduce a new dataset of human actions to study the occurrence/reoccurrence of patterns of human actions in edited TV material; second, we propose composing a video into threads of related shots, removing some of the discontinuities due to shot boundaries; and third, we show the benefits of utilizing video threads in recognizing human actions. The experiments demonstrate that human action retrieval accuracy can be improved using threads.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (2011)
Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in TV shows. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2441–2453 (2012)
Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2011)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the International Conference on Computer Vision (2013)
Hoai, M., Zisserman, A.: Talking heads: detecting humans and recognizing their interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/Script: alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 158–171. Springer, Heidelberg (2008)
Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Trans. Multimed. 8, 686–697 (2006)
Yeung, M., Yeo, B.L., Liu, B.: Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst. 71, 94–109 (1998)
Kender, J., Yeo, B.L.: Video scene segmentation via continuous video coherence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1998)
Chasanis, V.T., Likas, A.C., Galatsanos, N.P.: Scene detection in videos using shot clustering and sequence alignment. IEEE Trans. Multimed. 11, 89–100 (2009)
Lehane, B., O’Connor, N.E., Murphy, N.: Dialogue sequence detection in movies. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 286–296. Springer, Heidelberg (2005)
Lehane, B., O’Connor, N.E., Smeaton, A.F., Lee, H.: A system for event-based film browsing. In: Göbel, S., Malkewitz, R., Iurgel, I. (eds.) TIDSE 2006. LNCS, vol. 4326, pp. 334–345. Springer, Heidelberg (2006)
Pickup, L., Zisserman, A.: Automatic retrieval of visual continuity errors in movies. In: ACM International Conference on Image and Video Retrieval (2009)
Tapaswi, M., Bauml, M., Stiefelhagen, R.: Storygraphs: visualizing character interactions as a timeline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the International Conference on Pattern Recognition (2004)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007)
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24, 971–981 (2012)
Everingham, M., Sivic, J., Zisserman, A.: “hello! my name is ... Buffy” - automatic naming of characters in tv video. In: Proceedings of the British Machine Vision Conference (2006)
Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. J. Electron. Imaging 5, 122–128 (1996)
Lienhart, R.: Comparison of automatic shot boundary detection algorithms. In: SPIE, vol. 3656 (1998)
Lienhart, R.: Reliable transition detection in videos: a survey and practitioner’s guide. Int. J. Image Graph. 1, 469–486 (2001)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 3–20. Springer, Heidelberg (2014)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300 (1999)
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the International Conference on Machine Learning (1998)
Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., DeMoor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of the British Machine Vision Conference (2009)
Hoai, M.: Regularized max pooling for image categorization. In: Proceedings of the British Machine Vision Conference (2014)
Cawley, G.C., Talbot, N.L.: Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw. 17, 1467–1475 (2004)
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 84–97. Springer, Heidelberg (2012)
Marin-Jimenez, M.J., Yeguas, E., de la Blanca, N.P.: Exploring STIP-based models for recognizing human interactions in TV videos. PRL 34, 1819–1828 (2013)
Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)
Mathe, S., Sminchisescu, C.: Dynamic eye movement datasets and learnt saliency models for visual action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 842–856. Springer, Heidelberg (2012)
Gaidon, A., Harchaoui, Z., Schmid, C.: Recognizing activities with cluster-trees of tracklets. In: Proceedings of the British Machine Vision Conference (2012)
Acknowledgements
This work was supported by the EPSRC grant EP/I012001/1 and a Royal Society Wolfson Research Merit Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hoai, M., Zisserman, A. (2015). Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9006. Springer, Cham. https://doi.org/10.1007/978-3-319-16817-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-16817-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16816-6
Online ISBN: 978-3-319-16817-3
eBook Packages: Computer ScienceComputer Science (R0)