Skip to main content
Log in

Abstract

Event recognition is a crucial task to provide high-level semantic description of the video content. The bag-of-words (BoW) approach has proven to be successful for the categorization of objects and scenes in images, but it is unable to model temporal information between consecutive frames. In this paper we present a method to introduce temporal information for video event recognition within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW. The sequences are treated as strings (phrases) where each histogram is considered as a character. Event classification of these sequences of variable length, depending on the duration of the video clips, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two domains, soccer videos and a subset of TRECVID 2005 news videos, demonstrate the validity of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. http://www.micc.unifi.it/vim

References

  1. Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines—a kernel approach. In: Proc. of int’l workshop on frontiers in handwriting recognition

  2. Ballan L, Bertini M, Del Bimbo A, Serra G (2009) Action categorization in soccer videos using string kernels. In: Proc. of IEEE int’l workshop on content-based multimedia indexing (CBMI). Chania, Crete

    Google Scholar 

  3. Ballan L, Bertini M, Del Bimbo A, Serra G (2009) Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimedia Tools and Applications (in press)

  4. Berg C, Christensen JPR, Ressel P (1984) Harmonic analysis on semigroups. Springer, Berlin

    MATH  Google Scholar 

  5. Bertini M, Del Bimbo A, Serra G (2008) Learning rules for semantic video event annotation. In: Proc. of int’l conference on visual information systems (VISUAL)

  6. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  7. Boiman O, Irani M (2007) Detecting irregularities in images and in video. Int J Comput Vis 74(1):17–31

    Article  Google Scholar 

  8. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proc. of ACM int’l workshop on computational learning theory

  9. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  10. Chen J, Ye J (2008) Training svm with indefinite kernels. In: Proc. of int’l conference on machine learning (ICML)

  11. Cover T (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14(3):326–334

    Article  MATH  Google Scholar 

  12. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proc. of int’l workshop on VS-PETS

  13. Ebadollahi S, Xie L, Chang SF, Smith JR (2006) Visual event detection using multi-dimensional concept dynamics. In: Proc. of IEEE int’l conference on multimedia and expo (ICME)

  14. Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)

  15. Fergus R, Perona P, Zisserman A (2005) A sparse object category model for efficient learning and exhaustive recognition. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)

  16. Francois ARJ, Nevatia R, Hobbs JR, Bolles RC (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86

    Article  Google Scholar 

  17. Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic, London

    MATH  Google Scholar 

  18. Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4):482–492

    Article  Google Scholar 

  19. Haubold A, Naphade M (2007) Classification of video events using 4-dimensional time-compressed motion features. In: Proc. of ACM int’l conference on image and video retrieval (CIVR)

  20. Ke Y, Sukthankar R, Hebert M (2005) Efficient visual event detection using volumetric features. In: Proc. of int’l conference on computer vision (ICCV)

  21. Kennedy L (2006) Revision of LSCOM event/activity annotations, DTO challenge workshop on large scale concept ontology for multimedia. Advent technical report #221-2006-7, Columbia University

  22. Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123

    Article  Google Scholar 

  23. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)

  24. Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Proc. of int’l conference on neural information processing systems (NIPS)

  25. Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:563–569

    Article  Google Scholar 

  26. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  27. Luss R, D’Aspremont A (2008) Support vector machine classification with indefinite kernels. In: Proc. of int’l conference on neural information processing systems (NIPS)

  28. Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):144–152

    Article  Google Scholar 

  29. Moreno PJ, Ho PP, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classification in multimedia applications. In: Proc. of int’l conference on neural information processing systems (NIPS)

  30. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1): 31–88

    Article  Google Scholar 

  31. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453

    Article  Google Scholar 

  32. Neuhaus M, Bunke H (2006) Edit distance-based kernel functions for structural pattern classification. Pattern Recogn 39(10):1852–1863

    Article  MATH  Google Scholar 

  33. Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  34. Riedel DE, Venkatesh S, Liu W (2008) Recognising online spatial activities using a bioinformatics inspired sequence alignment approach. Pattern Recogn 41(11):3481–3492

    Article  MATH  Google Scholar 

  35. Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. EEE Trans Circuits Syst Video Technol 15(10):1225–1233

    Article  Google Scholar 

  36. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of int’l conference on pattern recognition (ICPR)

  37. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York

    Google Scholar 

  38. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proc. of int’l conference on computer vision (ICCV)

  39. Smeaton AF Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)

  40. Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: Proc. of ACM int’l conference on multimedia (MM)

  41. Xiang T, Gong S (2008) Incremental and adaptive abnormal behaviour detectionq incremental and adaptive abnormal behaviour detection. Comput Vis Image Underst 111:59–73

    Article  Google Scholar 

  42. Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997

    Article  Google Scholar 

  43. Yang J, Hauptmann AG (2006) Exploring temporal consistency for video analysis and retrieval. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)

  44. Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proc. of ACM int’l workshop on multimedia information retrieval (MIR)

  45. Zhang D, Perez DG, Bengio S, McCowan I (2005) Semi-supervised adapted HMMs for unusual event detection. In: Proc. of int’l conference on computer vision and pattern recognition (CVPR)

  46. Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238

    Article  Google Scholar 

  47. Zhou X, Zhuang X, Yan S, Chang SF, Hasegawa-Johnson M, Huang T (2008) Sift-bag kernel for video event analysis. In: Proc. of ACM int’l conference on multimedia (MM)

Download references

Acknowledgements

This work is partially supported by the EU IST VidiVideo Project (Contract FP6-045547) and IM3I Project (Contract FP7-222267). The authors thank Filippo Amendola for his support in the preparation of the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lamberto Ballan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ballan, L., Bertini, M., Del Bimbo, A. et al. Video event classification using string kernels. Multimed Tools Appl 48, 69–87 (2010). https://doi.org/10.1007/s11042-009-0351-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0351-3

Keywords

Navigation