HMM-based ball hitting event exploration system for broadcast baseball video

https://doi.org/10.1016/j.jvcir.2012.03.006Get rights and content

Abstract

With the dramatic growth of fandom population, a considerable amount of research efforts have been devoted to baseball video processing. However, little work focuses on the detailed follow-ups of ball hitting events. This paper proposes a HMM-based ball hitting event exploration system for broadcast baseball video. Utilizing the strictly-defined layout of the baseball field, the proposed system first detects the game-specific spatial patterns in the field, such as the field lines, the bases, the pitch mound, etc. Then, the play region—the currently camera-focused region of the baseball field is identified for frame type classification. Since the temporal patterns of presenting the game progress follow a prototypical order, we consider the classified frame types as observation symbols and recognize ball hitting events using HMM. Experiments conducted on broadcast baseball video show encouraging results in frame type classification and ball hitting event recognition. Three practical applications, including highlight clip extraction by user-designated query, storyboard construction, and similar event retrieval, are introduced to address the applicability of our system.

Highlights

► We propose a ball hitting event exploration system for baseball video. ► Ten spatial patterns, 16 frame types, and 11 events are analyzed. ► Explicit information of play region transitions within a single filed shot is extracted. ► Extensive applications are developed based on the proposed framework.

Introduction

The explosively increasing amount of digital videos motivates researchers to strive for various aspects of video analysis. In recent years, the amount of multimedia information has grown rapidly. This trend leads to the development of efficient sports video analysis in soccer [1], [2], [3], tennis [4], [5], [6], basketball [7], [8], [9], volleyball [10], baseball [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], etc. Automatic sports video analysis has attracted considerable attention, because sport video appeals to large audiences. The possible applications of sports video analysis have been found almost in all sports, among which baseball is a quite popular one. It is time-consuming to watch the whole game video in sequential way, while highlights abstract the game for quick browsing. In addition, highlights can be contributive to tactic inference for coaches, players, and even professional sports fans. For these motivations, we aim at developing a highlight semantics exploring system for the baseball games.

Baseball video is characterized by a strictly-defined structure containing a series of plays and each play starts with a pitch. Hence, PC (pitcher–catcher) shot detection and semantic shot classification play an important role in baseball highlight detection [11], [12]. Furthermore, various kinds of pitch analyses have been addressed to derive the correlation between the ball trajectory and the rotation by tracking the translation and rotation of a pitched ball [13], to extract the ball trajectory based on physical characteristics [14], to reconstruct the 3D trajectory of the pitched ball with multiple cameras [15], and even to recognize the pitching style based on the pitcher’s posture [16].

Due to broadcast requirement, there has been an essential demand for highlight extraction which aims at abstracting a long game into a compact summary to provide the audience a quick browsing of the game. Moreover, highlight extraction/classification also contributes to many applications such as efficient event indexing and retrieval, providing the reference for tactics inference to the coach and players, user-designated highlight clip extraction, etc. In the past few years, remarkable research has been devoted to baseball video content analysis. Hung and Hsieh [17] categorize shots into pitcher-catcher, infield, outfield, and non-field shots. Combining the detected scoreboard information with the obtained shot types as mid-level cues, Hung et al. use Bayesian Belief Network (BBN) structure for highlight classification. Chu and Wu [18] consider most of the possible conditions in a baseball game based on the game-specific rules and extract the scoreboard information for event detection. Though both Hung and Hsieh [17] and Chu and Wu [18] achieve high accuracy in highlight classification due to the additional information from the scoreboard, their rough shot classification approaches are inadequate to analyze the ball movement and play region transitions for ball hitting events. Gong et al. [19] classify baseball highlights by integrating image, audio, and closed caption cues based on MEM (Maximum Entropy Model). Fleischman et al. [20] use complex temporal features, such as field type, speech, camera motion start time and end time. Temporal data mining techniques are exploited to discover a codebook of frequent temporal patterns for baseball highlight classification.

Because the positions of cameras are fixed in a game and the ways of showing game progressing are similar in different TV channels, each category of semantic baseball event usually has similar scene transitions. For example, a typical “fly out” event can be composed of a PC scene followed by an outfield scene and then an in-grass scene. Hence, the statistical model of HMM is broadly used for highlight detection and classification. Lien et al. [21] extract significant color, object number, motion vector, and player location as features to classify eight semantic scenes: close-up, base, running, pitching, player, infield, outfield, and other. Based on the classified scenes serving as the observation symbol sequence, a 4-state ergodic HMM is applied to detect four baseball events: base hit, ground out, air out, and strike out. Though, good performance is achieved in Lien et al. [21], only four events are detected. It is not so realistic to provide only four events to general users, not to mention the professional players or the coach. Cheng and Hsu [22] fuse visual motion information with audio features, including zero crossing rate, pitch period and Mel-frequency cepstral coefficients (MFCC), to extract baseball highlight based on hidden Markov model (HMM). Mochizuki et al. [23] provide a baseball indexing method based on patternizing baseball scenes using a set of rectangles with image features and a motion vector. Chang et al. [24] assume that most highlights in baseball games consist of certain shot types and these shots have similar transitions in time. Each highlight is described by a HMM and each hidden state is represented by its predefined shot type. Some features are used as observations to train the HMM model for highlight recognition. In Mochizuki et al. [23] and Chang et al. [24], low accuracy and few highlight types are the main disadvantages because the information is too little to detect various highlights and to get high accuracy.

Even if the previous works claim good results on highlight classification, they do not analyze a variety of ball hitting event types and have no idea of the detailed batting process and ball movement within a shot, such as: “The ball batted into the left infield is picked up by an infielder and then thrown to the first baseman.” In nature, the first/second/third basemen, the shortstop as well as other players are important objects in terms of event understanding. However, when the camera focuses on a player, it is hard to recognize his fielding position. Hence, in this paper we explore field shots (the shots follow the batted ball in the field) and utilize the game-specific spatial patterns, e.g., the bases and the pitch mound, to identify the regions which the ball has passed through. With great success in speech recognition, HMMs are effective models for time-varying patterns and have been used widely in scene modeling for sports video [21], [22], [23], [24]. Thus, we propose an HMM-based mechanism to detect and classify up to 11 ball hitting events: (1) single, (2) double, (3) pop up, (4) fly out, (5) ground out, (6) two base hit, (7) foul ball, (8) foul out, (9) double play, (10) home run, and (11) home base out. In addition to providing the detailed description of each play, a baseball exploration system is also developed, so users can efficiently retrieve the batting clips desired. With the proposed framework, highlight extraction and event indexing in baseball video will be more powerful and practical, since comprehensive, detailed, and explicit information about the game can be presented to users.

The rest of the paper is organized as follows. Section 2 describes the system overview of the proposed ball hitting event recognition. The processes of visual feature extraction and frame type classification are explained in Sections 3 Visual feature extraction, 4 Frame type classification and annotation string generation, respectively. Section 5 elaborates how to recognize ball hitting events using HMM. Experimental results and discussion are presented in Section 6. Section 7 introduces extensive applications based on the proposed system. Finally, Section 8 concludes this paper and describes the future work.

Section snippets

Overview of the proposed HMM-based ball hitting event exploration system

With the foregoing motivation and limitations of the existing works, we develop a HMM-based ball hitting event exploration system for broadcast baseball video. As illustrated in Fig. 1, the system contains three main components including visual feature extraction, frame type classification, and HMM-based ball hitting event recognition. In a baseball game, each play starts with a PC (pitcher–catcher) shot and ends up with specific shots. To trim out the uninteresting segments, e.g., commercials,

Visual feature extraction

In our proposed system, significant colors and game-specific spatial patterns are extracted as visual features.

Frame type classification and annotation string generation

In order to comprehend the detailed process of the ball hitting event, we have to recognize the play region, the currently camera-focused region in the baseball field, for frame type classification. Based on the detected spatial patterns, we classify each field frame into one of the 16 types: IL (infield left), IC (infield center), IR (infield right), B1 (first base), B2 (second base), B3 (third base), OL (outfield left), OC (outfield center), OR (outfield right), PS (player in soil), PG

HMM-based ball hitting event recognition

This main objective of this paper is to develop a ball hitting event exploration system to trace the play region transition and recognize the ball hitting event. Regarding the classified frame types as the observation symbols, we propose a HMM-based approach to recognize various ball hitting events, including: single, double, pop up, fly out, ground out, two-base out, foul ball, foul out, double play, home run, and home base out.

Generally, HMM is expressed by a 3-tuple parameters λ = {A, B, π}. The

Experimental results and discussion

To demonstrate the effectiveness of the proposed frame type classification and ball hitting event recognition approaches, we conduct the experiments on the video data of MLB (Major League Baseball) and JPB (Japanese Professional Baseball) games. In total, we have 253 video clips recorded from live broadcast television programs and compressed in MPEG-2 video standard with frame resolution of 352 × 240 (29.97 fps). For the evaluation of our proposed methods, 122 clips are randomly selected for

Highlight clip extraction by user-designated query

We have implemented a preliminary prototype of the user interface of the proposed baseball exploration system, as shown in Fig. 15. The video is displayed in area A and the visual presentation of the video analysis is provided in B. Area C gives the information about the detected spatial patterns. Furthermore, users are allowed to designate play region types in D for exploration. The highlight clips containing the user-designated play region types are retrieved and listed in E with their

Conclusions and future work

In this paper, we propose a HMM-based ball hitting event exploration system for broadcast baseball video capable of spatial pattern detection, frame type classification and event recognition. Convincing results and encouraging performance are obtained. Furthermore, the proposed system also facilitates extensive applications, such as highlight clip extraction by user-designated query, storyboard construction and similar event retrieval.

Compared with existing works on baseball video analysis, the

Acknowledgments

This work is supported in part by “Aim for the Top University Plan” of the National Chiao Tung University and Ministry of Education, Taiwan, R.O.C., and in part by National Science Council of R.O.C. under the Grant Nos. 98-2221-E-009-091-MY3 and 101-2218-E-009-004-.

References (33)

  • S. Liu et al.

    Multimodal semantic analysis and annotation for basketball video

    EURASIP Journal on Applied Signal Processing

    (2006)
  • H.-T. Chen, H.-S. Chen, S.-Y. Lee, Physics-based ball tracking in volleyball videos with its applications to set type...
  • M. Kumano, Y. Ariki, K. Tsukada, S. Hamaguchi, H. Kiyose, Automatic extraction of PC scenes based on feature mining for...
  • L.-Y. Duan et al.

    A unified framework for semantic shot classification in sports video

    IEEE Transactions on Multimedia

    (2005)
  • H. Shum, T. Komura, Tracking the translational and rotational movement of the ball using high-speed camera movies, in:...
  • H.-T. Chen, H.-S. Chen, M.-H. Hsiao, Y.-W. Chen, S.-Y. Lee, A trajectory-based ball tracking framework with enrichment...
  • Cited by (9)

    • Artificial intelligence in sport performance analysis

      2021, Artificial Intelligence in Sport Performance Analysis
    • A Survey of Content-Aware Video Analysis for Sports

      2018, IEEE Transactions on Circuits and Systems for Video Technology
    • Review of research issues in broadcast sports video

      2014, International Journal of Applied Engineering Research
    View all citing articles on Scopus
    View full text