Skip to main content
Log in

Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

In this paper, an algorithm is proposed to summarize sports videos based on viewpoints in TV broadcasts for sports genre classification. The redundancy of multiple views is one of the principal limitations in sports genre classification. In order to remove the redundancy, the algorithm chooses the most representative subset of shots from each game. After videos are broken into shots, single keyframe is utilized to represent each shot and uniform LBP feature is extracted to represent each keyframe. Agglomerative hierarchical clustering is then performed to cluster these keyframes. In this step, an energy-based function for clusters is introduced to match the statistical distribution of various views, and a refined distance metric is proposed as similarity measure of two shots. We modify the energy function to meet the fact that temporally neighbored shots with similar duration are more likely to be in the same views. To make full use of the high overlap of selected key-frames subset, sparse coding and geometry visual phrase are introduced in the sports genre categorization part. Our method is evaluated on videos recorded from Orangesports, ESPN and Eurosport TV broadcast. The average accuracy over 10 sports reaches 87.5 %. The proposed algorithm is already applied in the Orange TV video content delinearization service platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. FIFA Document Football Stadiums: Technical recommendations and requirements – the \(5{th}\) Edition in www.fifa.com/aboutfifa/officialdocuments/doclists/laws.html

    Fig. 1
    figure 1

    TV camera positions required by FIFA and multiviews captured in one soccer broadcast video

References

  1. Yuan, D., Jiwei, Z., Nan, Z., Xiaofu, C., & Wei, L. (2012). Video concept detection based on multiple features and classifiers fusion, China Communications, 9(8), 105–121

  2. Ekin, A., Tekalp, A. M., & Mehrotra, R. (2003). Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7), 796–807.

    Article  Google Scholar 

  3. Dong, Y., & Lian, S. (2012). Automatic and fast temporal segmentation for personalized news consuming. Information Systems Frontiers, 14(3), 517–526.

    Article  Google Scholar 

  4. Wang, J., Xu, C., & Chng, E. (2006). Automatic sports video genre classification using pseudo-2d-hmm[C]//Pattern Recognition, 2006. ICPR 2006. 18th International Conference on. IEEE, 4, 778–781

  5. Jaser, E., Kittler, J., & Christmas, W. (2004). Hierarchical decision making scheme for sports video categorisation with temporal post-processing[C]//Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. IEEE, 2, II-908-II-913 Vol. 2.

  6. Bosch, A., Zisserman, A., & Muoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.

    Article  Google Scholar 

  7. Li, L., Zhang, N., & Duan, L.Y, et al. (2009) Automatic sports genre categorization and view-type classification over large-scale dataset[C]//Proceedings of the 17th ACM international conference on Multimedia. ACM, 653–656.

  8. Duan, L. Y., Xu, M., Tian, Q., et al. (2005). A unified framework for semantic shot classification in sports video. IEEE Transactions on Multimedia, 7(6), 1066–1083.

    Article  Google Scholar 

  9. Takahashi, Y., Nitta, N., & Babaguchi, N. (2005). Video summarization for large sports video archives[C], Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 1170–1173.

  10. Petkovic, M., Mihajlovic, V., & Jonker, W, et al. (2002). Multi-modal extraction of highlights from TV formula 1 programs[C]//Multimedia and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International Conference on. IEEE, 1, 817–820.

  11. Ngo, C. W., Pong, T. C., & Zhang, H. J. (2002). On clustering and retrieval of video shots through temporal slices analysis. IEEE Transactions on Multimedia, 4(4), 446–458.

    Article  Google Scholar 

  12. Ngo, C.W., Pong, T.C., & Zhang, H.J. (2001). On clustering and retrieval of video shots[C]//Proceedings of the ninth ACM international conference on Multimedia. ACM, 51–60.

  13. Schroff, F., Zitnick, C.L., & Baker, S. (2009). Clustering Videos by Location[C]//On British Machine Vision Conference (BMVC). 1–11.

  14. Zhang, Y., Jia, Z., & Chen, T. (2011). Image retrieval with geometry-preserving visual phrases[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 809–816.

  15. Tao, K., Dong, Y., & Bian, Y. The France Telecom Orange Labs(Beijing) Video Semantic Indexing Systems - TRECVID 2012 Notebook Paper, http://www.nlpir.nist.gov/projects/tvpubs/tv12.papers/ftrdbj.pdf

  16. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]//Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE, 2, 2169–2178.

  17. Dong, Y., Zhang, J., Zhao, N., Chang, X., & Liu, W. (2012).“Video Concept Detection Based on Multiple Features and Classifiers Fusion”, in China Communications, 9(8), 105–121.

  18. Dong, Y., Tao, K., Chang, X., Gao, S., Zhang, J., Bai, H., Liu, W., Zhao, F., Li, P., & Zen, C. The France Telecom Orange Labs (Beijing) Video Semantic Indexing Systems - TRECVID 2011 Notebook Paper, http://www-nlpir.nist.gov/projects/tvpubs/tv11.papers/ftrdbj.pdf.

  19. Yang, J., Yu, K., & Gong, Y. (2009). Linear spatial pyramid matching using sparse coding for image classification[C], Computer Vision and Pattern Recognition, et al. (2009). CVPR 2009. IEEE Conference on. IEEE, 1794–1801.

  20. Zhao, J., Hayasaka, R., Muranoi, R., & Matsushita, Y. (1998). A MPEG video structure analysis scheme and its application to hierarchical video browser. Telecommunication Systems, 9(3–4), 403–422.

    Article  Google Scholar 

  21. Philips, M., & Wolf, W. (1998). A multi-attribute shot segmentation algorithm for video programs. Telecommunication Systems, 9(3–4), 393–402.

    Article  Google Scholar 

  22. Dong, Y., Qin, G., Xiao, G.R., Lian, S.G., & Chang, X.F. (2013). “Advanced news video parsing via visual characteristics of anchorperson scenes”, in Telecommunication Systems. doi:10.1007/s11235-013-9731-0.

  23. Schaefer, G., & Zhou, H. Y. (2009). Fuzzy clustering for colour reduction in images. Telecommunication Systems, 40(1–2), 17–25.

    Article  Google Scholar 

  24. Saipullah, K. M., Kim, D. H., & Lee, S. L. (2011). Rotation invariant texture feature extraction based on sorted neighborhood differences. Proceedings - IEEE International Conference on Multimedia and Expo, 2011, 1–6.

  25. Siagian, C., & Itti, L. (2007). “Rapid biologically-inspired scene classification using features shared with visual attention”, IEEE Trans. Pattern Analysis and Machine Intelligence, 300–312.

  26. Dong, Y., Gao, S., & Tao, K. (2013). Performance evaluation of early and late fusion methods for generic semantics indexing, Pattern Analysis and Applications. Springer-Verlag. doi:10.1007/s10044-013-0336-8.

  27. Mirzaei, A., & Rahmati, M. (2010). A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Transactions on Fuzzy Systems, 18(1), 27–39.

    Article  Google Scholar 

  28. Sivic, J., & Zisserman A. (2003). Video Google: A text retrieval approach to object matching in videos[C], Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 1470–1477.

  29. Dong, Y., Zhang, J., Chang, X., & Zhao, J. (2012). “Automatic sports video genre categorization for broadcast videos”, in 2012 IEEE Visual Communications and Image Processing, 1–5.

Download references

Acknowledgments

This work is sponsored by collaborative Research Project (SEV01100474) between Beijing University of Posts and Telecommunications and France Telecom R&D – Orange Lab Beijing, the National High Technology Research and Development Program of China (863 Program,No.2012AA012505), and the National Natural Science Foundation of China (61372169)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Dong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, Y., Zhao, N., Lian, S. et al. Unsupervised mining of visually consistent shots for sports genre categorization over large-scale database. Telecommun Syst 59, 381–391 (2015). https://doi.org/10.1007/s11235-014-9943-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-014-9943-y

Keywords

Navigation