Structure analysis of soccer video with domain knowledge and hidden Markov models

doi:10.1016/j.patrec.2004.01.005

Pattern Recognition Letters

Volume 25, Issue 7, May 2004, Pages 767-775

https://doi.org/10.1016/j.patrec.2004.01.005 Get rights and content

Abstract

In this paper, we present statistical techniques for parsing the structure of produced soccer programs. The problem is important for applications such as personalized video streaming and browsing systems, in which videos are segmented into different states and important states are selected based on user preferences. While prior work focuses on the detection of special events such as goals or corner kicks, this paper is concerned with generic structural elements of the game. We define two mutually exclusive states of the game, play and break based on the rules of soccer. Automatic detection of such generic states represents an original challenging issue due to high appearance diversities and temporal dynamics of such states in different videos. We select a salient feature set from the compressed domain, dominant color ratio and motion intensity, based on the special syntax and content characteristics of soccer videos. We then model the stochastic structures of each state of the game with a set of hidden Markov models. Finally, higher-level transitions are taken into account and dynamic programming techniques are used to obtain the maximum likelihood segmentation of the video sequence. The system achieves a promising classification accuracy of 83.5%, with light-weight computation on feature extraction and model inference, as well as a satisfactory accuracy in boundary timing.

Introduction

In this paper, we present algorithms for the analysis of video structure using domain knowledge and supervised learning of statistical models. The domain of interest here is soccer video, and the structure we are interested in is the temporal sequence of high-level game states; namely, play and break. The goal of this work is to parse the continuous video stream into a sequence of component state labels automatically, i.e., to jointly segment the video sequence into homogeneous chunks and classify each segment as one of the semantic states as well. Structure parsing is not only useful in automatic content filtering for general TV audience and soccer professionals in this special domain, it is also related to an important general problem of video structure analysis and content understanding. While most existing work focuses on the detection of domain-specific events, our approach in generic high-level structure analysis is distinctive with several important advantages: (1) the generic state information can be used to filter and significantly reduce the video data. For example, typically no more than 60% of the video corresponds to play, thus we can achieve significant information reduction; (2) videos in different states clearly have different temporal variations, which can be captured by statistical temporal models such as the hidden Markov models (HMM).

Related work in the literature of sports video analysis has addressed soccer and various sports games. For soccer video, prior work has been on shot classification (Gong et al., 1995), scene reconstruction (Yow et al., 1995), and rule-based semantic classification (Qian and Tovinkere, 2001). For other sports video, supervised learning was used by Zhong and Chang (2001) to recognize canonical views such as baseball pitching and tennis serve. In the area of video genre segmentation and classification, Wang et al. (2000) have developed HMM-based models for classifying videos into news, commercial, sports and weather reports.

In this work, we first exploit domain-specific video syntax to identify salient high-level structures. Such syntactic structures are usually associated with important semantic meanings in specific domains. Taking soccer as a test case, we identify play and break as two recurrent high-level structures, which correspond well to the semantic states of the game. Such observations then lead us to choosing two simple, but effective features in the compressed domain, dominant color ratio and motion intensity. In our prior work (Xu et al., 2001), we showed such specific set of features, when combined with rule-based detection techniques, were indeed effective in play/break detection in soccer. In this paper, we will use formal statistical techniques to model domain-specific syntactic constraints rather than using heuristic rules only. The stochastic structure within a play or a break is modelled with a set of HMMs, and the transition among these HMMs is addressed with dynamic programming. Average classification accuracy per segment is 83.5%, and most of the play/break boundaries are correctly detected within a 3-second window (Xie et al., 2002). It is encouraging that high-level domain-dependent video structures can be computed with high accuracy using compressed-domain features and generic statistical tools. We believe that the performance can be attributed to the match of features to the domain syntax and the power of the statistical tools in capturing the perceptual variations and temporal dynamics of the video.

In Section 2, we define the high-level structures of play and break in soccer, and present relevant observations of soccer video syntax; in Section 3 we describe algorithms for feature extraction and validation results of such a feature set with rule-based detection; in Section 4 we discuss algorithms for training HMMs and using the models to segment new videos to play and break; experiments and results are presented in Section 5; and in Section 6 we draw conclusions and discuss future work.

Section snippets

The syntax and high-level structures in soccer video

In this section, we present a few observations on soccer video that explore the interesting relations between syntactic structures and semantic states of the video.

Computing informative features

Based on observations relating soccer video semantics, video production syntax and low-level perceptual features, we use one special feature, dominant color ratio, along with one generic feature, motion intensity, to capture the characteristics of soccer video content. Moreover, out attention here is on compressed-domain features, since one of the objectives of the system is real-time performance under constrained resource and diverse device settings.

Play-break segmentation with HMMs

In a sense, distinguishing the distinct inherent states of a soccer game, play (P) and break (B), is analogous to isolated word recognition in (Rabiner, 1989). Here each model corresponds to a class––phoneme in the speech case, P or B in a soccer video; the sub-structures within each model accounts for transitions and variations within and between phonemes in speech, and the switching of shots and the variations of motion in a soccer game. This analogy leads to our use of HMMs for soccer video

Experiments

Four soccer video clips used in our experiment are briefly described in Table 1. All clips are in MPEG-1 format, SIF size, 30 frames per second or 25 frames per second. The dominant hue values are adaptively learned for each clip (Section 3.1) and the dominant color ratios are computed on I- and P-frames only. The motion intensities are computed on P-frames and interpolated on I-frames. A window of three seconds long sliding by one second is used to convert continuous feature stream into short

Conclusion

In this paper, we presented new algorithms for soccer video segmentation and classification. First, play and break are defined as the basic semantic elements of a soccer video; second, observations of soccer video syntax are described and feature set is chosen based on these observations; and then, classification/segmentation is performed with HMM followed by dynamic programming. The results are evaluated in terms of classification accuracy and segmentation accuracy; extensive statistical

References (13)

FIFA, 2002. Laws of the game. Federation Internationale de Football Association,...
Gong, Y., Lim, T., Chua, H., May 1995. Automatic parsing of TV soccer programs. In: IEEE International Conference on...
Qian, R.J., Tovinkere, V., August 2001. Detecting semantic events in soccer games: Towards a complete solution. In:...
L.R. Rabiner
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE
(February 1989)
Ramesh, P., Wilpon, J., 1992. Modeling state durations in hidden Markov models for automatic speech recognition. In:...
Shook, F. (Ed.), 1995. Television field production and reporting, 2nd Edition. Longman Publisher USA, Sports...

There are more references available in the full text version of this article.

Cited by (178)

Research on virtual simulation of basketball technology 3D animation based on FPGA and motion capture system
2021, Microprocessors and Microsystems
A stable structure and group information for measuring the driving dynamic binding character information processing, network control and operation when ready. Created with Unity, Tensol Stream and PyTorch. This method can be done bipedal and quadruped movement in the past occupied the complex and rational activity. The study will help researchers select the appropriate exercise programs and motion capture framework to test their gadgets. The rose to select and added to the open network arena, improve audit in their investigation. Framework has instructed sub conventional and highly accurate three-dimensional movement. Article screening to determine the motion capture framework research study of distributed source in one hundred academic studies used in the report of human activity. The proposed use of support to Field Programmable Gate array (FPGA) virtual computing innovation replay basketball to move the ball from the foundation and other projects, such as the distinction between players and referees. The two-dimensional position of the ball results are calibrated to identify the real position of the ball. Reality from the position of the ball, shoot the scene from every angle camera, and further ground plane. True development continue to see the ball from the top. Tests showed that the proposed calculation obstacles can be handled quickly and less difficult ball, the ball slightly different size and shape.
Hand Drawn Optical Circuit Recognition
2016, Procedia Computer Science
Electrical diagram is foundation of studies in electrical science. A circuit diagram convey many information about the system. Behind any device there are plenty of electrical ingredients which perform their specific tasks, today all the electrical software tools failed to effectively convert the information automatically from a circuit image diagram to digital form. Hence electrical engineers should manually enter all information into computers, and this process takes time and bring errors with high probability. Moreover, when the diagram is hand drawn, the problem is more complicated for any electrical analysis. Thus, in this paper we propose a new method using Artificial Neural Network (ANN) to make a machine that can directly read the electrical symbols from a hand drawn circuit image. The recognition process involves two steps: first step is feature extraction using shape based features, and the second one is a classification procedure using ANN through a back propagation algorithm. The ANN was trained and tested with different hand drawn electrical images. The results show that our proposal is viable and brings good performances.
A novel feature selection method using generalized inverted Dirichlet-based HMMs for image categorization
2022, International Journal of Machine Learning and Cybernetics
Automatic play segmentation of hockey videos
2021, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Techniques and applications for soccer video analysis: A survey
2020, Multimedia Tools and Applications
Image on the street is...: Folk depictions of the global south in social imagery and social video at mass scale
2019, Deconstructing Images of the Global South Through Media Representations and Communication

View all citing articles on Scopus

View full text