Video event classification using string kernels

Ballan, Lamberto; Bertini, Marco; Del Bimbo, Alberto; Serra, Giuseppe

doi:10.1007/s11042-009-0351-3

Video event classification using string kernels

Published: 15 September 2009

Volume 48, pages 69–87, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lamberto Ballan¹,
Marco Bertini¹,
Alberto Del Bimbo¹ &
…
Giuseppe Serra¹

364 Accesses
30 Citations
3 Altmetric
Explore all metrics

Abstract

Event recognition is a crucial task to provide high-level semantic description of the video content. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it is unable to model temporal information between consecutive frames. In this paper we present a method to introduce temporal information for video event recognition within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW. The sequences are treated as strings (phrases) where each histogram is considered as a character. Event classification of these sequences of variable length, depending on the duration of the video clips, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two domains, soccer videos and a subset of TRECVID 2005 news videos, demonstrate the validity of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Concept Detection Using Dense Codeword Motion

Violent scene detection algorithm based on kernel extreme learning machine and three-dimensional histograms of gradient orientation

Article Open access 07 December 2018

A modified vector of locally aggregated descriptors approach for fast video classification

Article 21 August 2015

Notes

http://www.micc.unifi.it/vim

References

Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines—a kernel approach. In: Proc. of int’l workshop on frontiers in handwriting recognition
Ballan L, Bertini M, Del Bimbo A, Serra G (2009) Action categorization in soccer videos using string kernels. In: Proc. of IEEE int’l workshop on content-based multimedia indexing (CBMI). Chania, Crete
Google Scholar
Ballan L, Bertini M, Del Bimbo A, Serra G (2009) Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimedia Tools and Applications (in press)
Berg C, Christensen JPR, Ressel P (1984) Harmonic analysis on semigroups. Springer, Berlin
MATH Google Scholar
Bertini M, Del Bimbo A, Serra G (2008) Learning rules for semantic video event annotation. In: Proc. of int’l conference on visual information systems (VISUAL)
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Article Google Scholar
Boiman O, Irani M (2007) Detecting irregularities in images and in video. Int J Comput Vis 74(1):17–31
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proc. of ACM int’l workshop on computational learning theory
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen J, Ye J (2008) Training svm with indefinite kernels. In: Proc. of int’l conference on machine learning (ICML)
Cover T (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14(3):326–334
Article MATH Google Scholar
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proc. of int’l workshop on VS-PETS
Ebadollahi S, Xie L, Chang SF, Smith JR (2006) Visual event detection using multi-dimensional concept dynamics. In: Proc. of IEEE int’l conference on multimedia and expo (ICME)
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)
Fergus R, Perona P, Zisserman A (2005) A sparse object category model for efficient learning and exhaustive recognition. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)
Francois ARJ, Nevatia R, Hobbs JR, Bolles RC (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86
Article Google Scholar
Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic, London
MATH Google Scholar
Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4):482–492
Article Google Scholar
Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time-compressed motion features. In: Proc. of ACM int’l conference on image and video retrieval (CIVR)
Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. In: Proc. of int’l conference on computer vision (ICCV)
Kennedy L (2006) Revision of LSCOM event/activity annotations, DTO challenge workshop on large scale concept ontology for multimedia. Advent technical report #221-2006-7, Columbia University
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123
Article Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)
Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Proc. of int’l conference on neural information processing systems (NIPS)
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:563–569
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Luss R, D’Aspremont A (2008) Support vector machine classification with indefinite kernels. In: Proc. of int’l conference on neural information processing systems (NIPS)
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):144–152
Article Google Scholar
Moreno PJ, Ho PP, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classification in multimedia applications. In: Proc. of int’l conference on neural information processing systems (NIPS)
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1): 31–88
Article Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Article Google Scholar
Neuhaus M, Bunke H (2006) Edit distance-based kernel functions for structural pattern classification. Pattern Recogn 39(10):1852–1863
Article MATH Google Scholar
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Article Google Scholar
Riedel DE, Venkatesh S, Liu W (2008) Recognising online spatial activities using a bioinformatics inspired sequence alignment approach. Pattern Recogn 41(11):3481–3492
Article MATH Google Scholar
Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. EEE Trans Circuits Syst Video Technol 15(10):1225–1233
Article Google Scholar
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of int’l conference on pattern recognition (ICPR)
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York
Google Scholar
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proc. of int’l conference on computer vision (ICCV)
Smeaton AF Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)
Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: Proc. of ACM int’l conference on multimedia (MM)
Xiang T, Gong S (2008) Incremental and adaptive abnormal behaviour detectionq incremental and adaptive abnormal behaviour detection. Comput Vis Image Underst 111:59–73
Article Google Scholar
Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997
Article Google Scholar
Yang J, Hauptmann AG (2006) Exploring temporal consistency for video analysis and retrieval. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)
Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)
Zhang D, Perez DG, Bengio S, McCowan I (2005) Semi-supervised adapted HMMs for unusual event detection. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)
Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238
Article Google Scholar
Zhou X, Zhuang X, Yan S, Chang SF, Hasegawa-Johnson M, Huang T (2008) Sift-bag kernel for video event analysis. In: Proc. of ACM int’l conference on multimedia (MM)

Download references

Acknowledgements

This work is partially supported by the EU IST VidiVideo Project (Contract FP6-045547) and IM3I Project (Contract FP7-222267). The authors thank Filippo Amendola for his support in the preparation of the experiments.

Author information

Authors and Affiliations

Media Integration and Communication Center, University of Florence, Florence, Italy
Lamberto Ballan, Marco Bertini, Alberto Del Bimbo & Giuseppe Serra

Authors

Lamberto Ballan
View author publications
You can also search for this author in PubMed Google Scholar
Marco Bertini
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Del Bimbo
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Serra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lamberto Ballan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ballan, L., Bertini, M., Del Bimbo, A. et al. Video event classification using string kernels. Multimed Tools Appl 48, 69–87 (2010). https://doi.org/10.1007/s11042-009-0351-3

Download citation

Published: 15 September 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s11042-009-0351-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video event classification using string kernels

Abstract

Access this article

Similar content being viewed by others

Semantic Concept Detection Using Dense Codeword Motion

Violent scene detection algorithm based on kernel extreme learning machine and three-dimensional histograms of gradient orientation

A modified vector of locally aggregated descriptors approach for fast video classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video event classification using string kernels

Abstract

Access this article

Similar content being viewed by others

Semantic Concept Detection Using Dense Codeword Motion

Violent scene detection algorithm based on kernel extreme learning machine and three-dimensional histograms of gradient orientation

A modified vector of locally aggregated descriptors approach for fast video classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation