Abstract
In this paper, an algorithm is proposed to summarize sports videos based on viewpoints in TV broadcasts for sports genre classification. The redundancy of multiple views is one of the principal limitations in sports genre classification. In order to remove the redundancy, the algorithm chooses the most representative subset of shots from each game. After videos are broken into shots, single keyframe is utilized to represent each shot and uniform LBP feature is extracted to represent each keyframe. Agglomerative hierarchical clustering is then performed to cluster these keyframes. In this step, an energy-based function for clusters is introduced to match the statistical distribution of various views, and a refined distance metric is proposed as similarity measure of two shots. We modify the energy function to meet the fact that temporally neighbored shots with similar duration are more likely to be in the same views. To make full use of the high overlap of selected key-frames subset, sparse coding and geometry visual phrase are introduced in the sports genre categorization part. Our method is evaluated on videos recorded from Orangesports, ESPN and Eurosport TV broadcast. The average accuracy over 10 sports reaches 87.5 %. The proposed algorithm is already applied in the Orange TV video content delinearization service platform.
Similar content being viewed by others
Notes
FIFA Document Football Stadiums: Technical recommendations and requirements – the \(5{th}\) Edition in www.fifa.com/aboutfifa/officialdocuments/doclists/laws.html
References
Yuan, D., Jiwei, Z., Nan, Z., Xiaofu, C., & Wei, L. (2012). Video concept detection based on multiple features and classifiers fusion, China Communications, 9(8), 105–121
Ekin, A., Tekalp, A. M., & Mehrotra, R. (2003). Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7), 796–807.
Dong, Y., & Lian, S. (2012). Automatic and fast temporal segmentation for personalized news consuming. Information Systems Frontiers, 14(3), 517–526.
Wang, J., Xu, C., & Chng, E. (2006). Automatic sports video genre classification using pseudo-2d-hmm[C]//Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. IEEE, 4, 778–781
Jaser, E., Kittler, J., & Christmas, W. (2004). Hierarchical decision making scheme for sports video categorisation with temporal post-processing[C]//Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. IEEE, 2, II-908-II-913 Vol. 2.
Bosch, A., Zisserman, A., & Muoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.
Li, L., Zhang, N., & Duan, L.Y, et al. (2009) Automatic sports genre categorization and view-type classification over large-scale dataset[C]//Proceedings of the 17th ACM international conference on Multimedia. ACM, 653–656.
Duan, L. Y., Xu, M., Tian, Q., et al. (2005). A unified framework for semantic shot classification in sports video. IEEE Transactions on Multimedia, 7(6), 1066–1083.
Takahashi, Y., Nitta, N., & Babaguchi, N. (2005). Video summarization for large sports video archives[C], Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 1170–1173.
Petkovic, M., Mihajlovic, V., & Jonker, W, et al. (2002). Multi-modal extraction of highlights from TV formula 1 programs[C]//Multimedia and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International Conference on. IEEE, 1, 817–820.
Ngo, C. W., Pong, T. C., & Zhang, H. J. (2002). On clustering and retrieval of video shots through temporal slices analysis. IEEE Transactions on Multimedia, 4(4), 446–458.
Ngo, C.W., Pong, T.C., & Zhang, H.J. (2001). On clustering and retrieval of video shots[C]//Proceedings of the ninth ACM international conference on Multimedia. ACM, 51–60.
Schroff, F., Zitnick, C.L., & Baker, S. (2009). Clustering Videos by Location[C]//On British Machine Vision Conference (BMVC). 1–11.
Zhang, Y., Jia, Z., & Chen, T. (2011). Image retrieval with geometry-preserving visual phrases[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 809–816.
Tao, K., Dong, Y., & Bian, Y. The France Telecom Orange Labs(Beijing) Video Semantic Indexing Systems - TRECVID 2012 Notebook Paper, http://www.nlpir.nist.gov/projects/tvpubs/tv12.papers/ftrdbj.pdf
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]//Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE, 2, 2169–2178.
Dong, Y., Zhang, J., Zhao, N., Chang, X., & Liu, W. (2012).“Video Concept Detection Based on Multiple Features and Classifiers Fusion”, in China Communications, 9(8), 105–121.
Dong, Y., Tao, K., Chang, X., Gao, S., Zhang, J., Bai, H., Liu, W., Zhao, F., Li, P., & Zen, C. The France Telecom Orange Labs (Beijing) Video Semantic Indexing Systems - TRECVID 2011 Notebook Paper, http://www-nlpir.nist.gov/projects/tvpubs/tv11.papers/ftrdbj.pdf.
Yang, J., Yu, K., & Gong, Y. (2009). Linear spatial pyramid matching using sparse coding for image classification[C], Computer Vision and Pattern Recognition, et al. (2009). CVPR 2009. IEEE Conference on. IEEE, 1794–1801.
Zhao, J., Hayasaka, R., Muranoi, R., & Matsushita, Y. (1998). A MPEG video structure analysis scheme and its application to hierarchical video browser. Telecommunication Systems, 9(3–4), 403–422.
Philips, M., & Wolf, W. (1998). A multi-attribute shot segmentation algorithm for video programs. Telecommunication Systems, 9(3–4), 393–402.
Dong, Y., Qin, G., Xiao, G.R., Lian, S.G., & Chang, X.F. (2013). “Advanced news video parsing via visual characteristics of anchorperson scenes”, in Telecommunication Systems. doi:10.1007/s11235-013-9731-0.
Schaefer, G., & Zhou, H. Y. (2009). Fuzzy clustering for colour reduction in images. Telecommunication Systems, 40(1–2), 17–25.
Saipullah, K. M., Kim, D. H., & Lee, S. L. (2011). Rotation invariant texture feature extraction based on sorted neighborhood differences. Proceedings - IEEE International Conference on Multimedia and Expo, 2011, 1–6.
Siagian, C., & Itti, L. (2007). “Rapid biologically-inspired scene classification using features shared with visual attention”, IEEE Trans. Pattern Analysis and Machine Intelligence, 300–312.
Dong, Y., Gao, S., & Tao, K. (2013). Performance evaluation of early and late fusion methods for generic semantics indexing, Pattern Analysis and Applications. Springer-Verlag. doi:10.1007/s10044-013-0336-8.
Mirzaei, A., & Rahmati, M. (2010). A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Transactions on Fuzzy Systems, 18(1), 27–39.
Sivic, J., & Zisserman A. (2003). Video Google: A text retrieval approach to object matching in videos[C], Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 1470–1477.
Dong, Y., Zhang, J., Chang, X., & Zhao, J. (2012). “Automatic sports video genre categorization for broadcast videos”, in 2012 IEEE Visual Communications and Image Processing, 1–5.
Acknowledgments
This work is sponsored by collaborative Research Project (SEV01100474) between Beijing University of Posts and Telecommunications and France Telecom R&D – Orange Lab Beijing, the National High Technology Research and Development Program of China (863 Program,No.2012AA012505), and the National Natural Science Foundation of China (61372169)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dong, Y., Zhao, N., Lian, S. et al. Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database. Telecommun Syst 59, 381–391 (2015). https://doi.org/10.1007/s11235-014-9943-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-014-9943-y