Journal of Visual Communication and Image Representation
HMM-based ball hitting event exploration system for broadcast baseball video
Highlights
► We propose a ball hitting event exploration system for baseball video. ► Ten spatial patterns, 16 frame types, and 11 events are analyzed. ► Explicit information of play region transitions within a single filed shot is extracted. ► Extensive applications are developed based on the proposed framework.
Introduction
The explosively increasing amount of digital videos motivates researchers to strive for various aspects of video analysis. In recent years, the amount of multimedia information has grown rapidly. This trend leads to the development of efficient sports video analysis in soccer [1], [2], [3], tennis [4], [5], [6], basketball [7], [8], [9], volleyball [10], baseball [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], etc. Automatic sports video analysis has attracted considerable attention, because sport video appeals to large audiences. The possible applications of sports video analysis have been found almost in all sports, among which baseball is a quite popular one. It is time-consuming to watch the whole game video in sequential way, while highlights abstract the game for quick browsing. In addition, highlights can be contributive to tactic inference for coaches, players, and even professional sports fans. For these motivations, we aim at developing a highlight semantics exploring system for the baseball games.
Baseball video is characterized by a strictly-defined structure containing a series of plays and each play starts with a pitch. Hence, PC (pitcher–catcher) shot detection and semantic shot classification play an important role in baseball highlight detection [11], [12]. Furthermore, various kinds of pitch analyses have been addressed to derive the correlation between the ball trajectory and the rotation by tracking the translation and rotation of a pitched ball [13], to extract the ball trajectory based on physical characteristics [14], to reconstruct the 3D trajectory of the pitched ball with multiple cameras [15], and even to recognize the pitching style based on the pitcher’s posture [16].
Due to broadcast requirement, there has been an essential demand for highlight extraction which aims at abstracting a long game into a compact summary to provide the audience a quick browsing of the game. Moreover, highlight extraction/classification also contributes to many applications such as efficient event indexing and retrieval, providing the reference for tactics inference to the coach and players, user-designated highlight clip extraction, etc. In the past few years, remarkable research has been devoted to baseball video content analysis. Hung and Hsieh [17] categorize shots into pitcher-catcher, infield, outfield, and non-field shots. Combining the detected scoreboard information with the obtained shot types as mid-level cues, Hung et al. use Bayesian Belief Network (BBN) structure for highlight classification. Chu and Wu [18] consider most of the possible conditions in a baseball game based on the game-specific rules and extract the scoreboard information for event detection. Though both Hung and Hsieh [17] and Chu and Wu [18] achieve high accuracy in highlight classification due to the additional information from the scoreboard, their rough shot classification approaches are inadequate to analyze the ball movement and play region transitions for ball hitting events. Gong et al. [19] classify baseball highlights by integrating image, audio, and closed caption cues based on MEM (Maximum Entropy Model). Fleischman et al. [20] use complex temporal features, such as field type, speech, camera motion start time and end time. Temporal data mining techniques are exploited to discover a codebook of frequent temporal patterns for baseball highlight classification.
Because the positions of cameras are fixed in a game and the ways of showing game progressing are similar in different TV channels, each category of semantic baseball event usually has similar scene transitions. For example, a typical “fly out” event can be composed of a PC scene followed by an outfield scene and then an in-grass scene. Hence, the statistical model of HMM is broadly used for highlight detection and classification. Lien et al. [21] extract significant color, object number, motion vector, and player location as features to classify eight semantic scenes: close-up, base, running, pitching, player, infield, outfield, and other. Based on the classified scenes serving as the observation symbol sequence, a 4-state ergodic HMM is applied to detect four baseball events: base hit, ground out, air out, and strike out. Though, good performance is achieved in Lien et al. [21], only four events are detected. It is not so realistic to provide only four events to general users, not to mention the professional players or the coach. Cheng and Hsu [22] fuse visual motion information with audio features, including zero crossing rate, pitch period and Mel-frequency cepstral coefficients (MFCC), to extract baseball highlight based on hidden Markov model (HMM). Mochizuki et al. [23] provide a baseball indexing method based on patternizing baseball scenes using a set of rectangles with image features and a motion vector. Chang et al. [24] assume that most highlights in baseball games consist of certain shot types and these shots have similar transitions in time. Each highlight is described by a HMM and each hidden state is represented by its predefined shot type. Some features are used as observations to train the HMM model for highlight recognition. In Mochizuki et al. [23] and Chang et al. [24], low accuracy and few highlight types are the main disadvantages because the information is too little to detect various highlights and to get high accuracy.
Even if the previous works claim good results on highlight classification, they do not analyze a variety of ball hitting event types and have no idea of the detailed batting process and ball movement within a shot, such as: “The ball batted into the left infield is picked up by an infielder and then thrown to the first baseman.” In nature, the first/second/third basemen, the shortstop as well as other players are important objects in terms of event understanding. However, when the camera focuses on a player, it is hard to recognize his fielding position. Hence, in this paper we explore field shots (the shots follow the batted ball in the field) and utilize the game-specific spatial patterns, e.g., the bases and the pitch mound, to identify the regions which the ball has passed through. With great success in speech recognition, HMMs are effective models for time-varying patterns and have been used widely in scene modeling for sports video [21], [22], [23], [24]. Thus, we propose an HMM-based mechanism to detect and classify up to 11 ball hitting events: (1) single, (2) double, (3) pop up, (4) fly out, (5) ground out, (6) two base hit, (7) foul ball, (8) foul out, (9) double play, (10) home run, and (11) home base out. In addition to providing the detailed description of each play, a baseball exploration system is also developed, so users can efficiently retrieve the batting clips desired. With the proposed framework, highlight extraction and event indexing in baseball video will be more powerful and practical, since comprehensive, detailed, and explicit information about the game can be presented to users.
The rest of the paper is organized as follows. Section 2 describes the system overview of the proposed ball hitting event recognition. The processes of visual feature extraction and frame type classification are explained in Sections 3 Visual feature extraction, 4 Frame type classification and annotation string generation, respectively. Section 5 elaborates how to recognize ball hitting events using HMM. Experimental results and discussion are presented in Section 6. Section 7 introduces extensive applications based on the proposed system. Finally, Section 8 concludes this paper and describes the future work.
Section snippets
Overview of the proposed HMM-based ball hitting event exploration system
With the foregoing motivation and limitations of the existing works, we develop a HMM-based ball hitting event exploration system for broadcast baseball video. As illustrated in Fig. 1, the system contains three main components including visual feature extraction, frame type classification, and HMM-based ball hitting event recognition. In a baseball game, each play starts with a PC (pitcher–catcher) shot and ends up with specific shots. To trim out the uninteresting segments, e.g., commercials,
Visual feature extraction
In our proposed system, significant colors and game-specific spatial patterns are extracted as visual features.
Frame type classification and annotation string generation
In order to comprehend the detailed process of the ball hitting event, we have to recognize the play region, the currently camera-focused region in the baseball field, for frame type classification. Based on the detected spatial patterns, we classify each field frame into one of the 16 types: IL (infield left), IC (infield center), IR (infield right), B1 (first base), B2 (second base), B3 (third base), OL (outfield left), OC (outfield center), OR (outfield right), PS (player in soil), PG
HMM-based ball hitting event recognition
This main objective of this paper is to develop a ball hitting event exploration system to trace the play region transition and recognize the ball hitting event. Regarding the classified frame types as the observation symbols, we propose a HMM-based approach to recognize various ball hitting events, including: single, double, pop up, fly out, ground out, two-base out, foul ball, foul out, double play, home run, and home base out.
Generally, HMM is expressed by a 3-tuple parameters λ = {A, B, π}. The
Experimental results and discussion
To demonstrate the effectiveness of the proposed frame type classification and ball hitting event recognition approaches, we conduct the experiments on the video data of MLB (Major League Baseball) and JPB (Japanese Professional Baseball) games. In total, we have 253 video clips recorded from live broadcast television programs and compressed in MPEG-2 video standard with frame resolution of 352 × 240 (29.97 fps). For the evaluation of our proposed methods, 122 clips are randomly selected for
Highlight clip extraction by user-designated query
We have implemented a preliminary prototype of the user interface of the proposed baseball exploration system, as shown in Fig. 15. The video is displayed in area A and the visual presentation of the video analysis is provided in B. Area C gives the information about the detected spatial patterns. Furthermore, users are allowed to designate play region types in D for exploration. The highlight clips containing the user-designated play region types are retrieved and listed in E with their
Conclusions and future work
In this paper, we propose a HMM-based ball hitting event exploration system for broadcast baseball video capable of spatial pattern detection, frame type classification and event recognition. Convincing results and encouraging performance are obtained. Furthermore, the proposed system also facilitates extensive applications, such as highlight clip extraction by user-designated query, storyboard construction and similar event retrieval.
Compared with existing works on baseball video analysis, the
Acknowledgments
This work is supported in part by “Aim for the Top University Plan” of the National Chiao Tung University and Ministry of Education, Taiwan, R.O.C., and in part by National Science Council of R.O.C. under the Grant Nos. 98-2221-E-009-091-MY3 and 101-2218-E-009-004-.
References (33)
- et al.
Automatic camera calibration of broadcast tennis video with applications to 3D virtual content insertion and ball detection and tracking
Computer Vision and Image Understanding
(2009) - et al.
Physics-based ball tracking and 3D trajectory reconstruction with applications to shooting location estimation in basketball video
Journal of Visual Communication and Image Representation
(2009) - et al.
Maximum entropy model-based baseball highlight detection and classification
Computer Vision and Image Understanding
(2004) - et al.
Scene-based event detection for baseball videos
Journal of Visual Communication and Image Representation
(2007) - et al.
Automatic composition of broadcast sports video
Multimedia Systems
(2008) - et al.
Trajectory-based ball detection and tracking in broadcast soccer video
IEEE Transactions Multimedia
(2006) - et al.
Event tactic analysis based on broadcast sports video
IEEE Transactions Multimedia
(2009) - G. Zhu, C. Xu, Q. Huang, W. Gao, L. Xing, Player action recognition in broadcast tennis video with applications to...
- et al.
Audiovisual integration for tennis broadcast structuring
Multimedia Tools and Applications
(2006) - et al.
Robust camera calibration and player tracking in broadcast basketball video
IEEE Transactions on Multimedia
(2011)
Multimodal semantic analysis and annotation for basketball video
EURASIP Journal on Applied Signal Processing
A unified framework for semantic shot classification in sports video
IEEE Transactions on Multimedia
Cited by (9)
A survey on location and motion tracking technologies, methodologies and applications in precision sports
2023, Expert Systems with ApplicationsArtificial intelligence in sport performance analysis
2021, Artificial Intelligence in Sport Performance AnalysisA Survey of Content-Aware Video Analysis for Sports
2018, IEEE Transactions on Circuits and Systems for Video TechnologyIncorporating frequent pattern analysis into multimodal HMM event classification for baseball videos
2016, Multimedia Tools and ApplicationsReview of research issues in broadcast sports video
2014, International Journal of Applied Engineering Research