Abstract
In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community, we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps: first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then, the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their combination.
Similar content being viewed by others
References
Kraaij W, Smeaton AF, Over P, Arlandis J “TRECVID 2004–An Overview”, TREC Video Retrieval Evaluation Online Proceedings, http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html
De Santo M, Percannella G, Sansone C, Vento M (2004) “Combining experts for anchorperson shot detection in news videos”, Pattern Analysis and Applications, vol. 7 no. 4, pp. 447–460, Springer, London
De Santo M, Percannella G, Sansone C, Vento M (2004) “A Multi-Expert Approach for Shot Classification in News Videos”, Lecture Notes in Computer Science vol. 3211, Springer, Berlin, pp. 564–571
Snoek CGM, Worring M (2005) “Multimodal video indexing: a review of the state-of-the-art”. Multimedia Tools Appl 25: 5–35
Gunsel B, Ferman AM, Tekalp AM (1996) “Video indexing through integration of syntactic and semantic features” In Proc. Workshop Applications of Computer Vision, Sarasota, FL, pp 90–95
Swanberg D, Shu CF, Jain R (1993) “Knowledge guided parsing in video databases” Proc. of SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, pp. 13–24
Smoliar SW, Zhang HJ, Tao SY, Gong Y (1995) “Automatic parsing and indexing of news video”. Multimedia Systems 2(6):256–265
Hanjalic A, Lagendijk RL, Biemond J (1999) “Semi-Automatic News Analysis, Indexing, and Classification System Based on Topics Preselection”, Proc. of SPIE: Electronic Imaging: Storage and Retrieval of Image and Video Databases, San Jose (CA)
Avrithis Y, Tsapatsoulis N, Kollias S (2000) “Broadcast news parsing using visual cues: A robust face detection approach”, Proc. IEEE Int. Conf. on Multimedia and Expo, vol. 3, pp. 1469–1472
Gao X, Tang X (2002) “Unsupervised Video-Shot Segmentation and Model-Free Anchorperson Detection for News Video Story Parsing”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No. 9, pp. 765 776
Bertini M, Del Bimbo A, Pala P (2001) “Content-based indexing and retrieval of TV News”. Pattern Recognition Letters 22:503–516
Eickeler S, Muller S (1999) “Content-based video indexing of TV broadcast news using Hidden Markov Models”, Proc. IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 2997–3000
Chaisorn L, Chua TS, Lee CH (2003) “A multi-modal approach to story segmentation for news video”. World wide Web 6:187–208
Wang C, Wang Y, Liu HY, He YX (2003) “Automatic Story Segmentation of News Video Based on Audio-Visual Features and Text Information”, Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, 2–5 November, pp 3008–3011
Wei W, Gao W (2002) Automatic segmentation of news items based on video and audio features. J Comput Sci Technol 17(2):189–195
Qi W, Gu L, Jiang H, Chen XR, Zhang HJ (2000) “Integrating Visual, Audio And Text Analysis For News Video”, 7th IEEE International Conference on Image Processing, Vancouver, British Columbia, Canada,10–13 September
Huang YS, Suen CY (1995) “A method of combining multiple experts for the recognition of unconstrained handwritten numerals”. IEEE Trans Pattern Analysis Machine Intell 17(1):90–94
Foggia P, Sansone C, Tortorella F, Vento M (1999) “Multiclassification: Reject Criteria for the Bayesian Combiner”. Pattern Recognit Pergamon 32(8):1435–1447
Sansone C, Tortorella F, Vento M (2001) “A Classification Reliability Driven Reject Rule for Multi-Expert Systems”. Int J Pattern Recognit Artificial Intell 15(6):885–904
Cordella LP, Foggia P, Sansone C, Vento M (2003) “A Real-Time Text-Independent Speaker Identification System”, Proceedings of the 12th International Conference on Image Analysis and Processing, IEEE Computer Society Press, Mantova, September 17–19, pp 632–637
Xu L, Krzyzak A, Oja E (1993) “Rival penalized competitive learning for clustering analysis, RBF net and curve detection”. IEEE Trans Neural Networks 4:636–649
Murthy HA, Beaufays F, Heck LP, Weintraub M (1999) “Robust text-independent speaker identification over telephone channels”. IEEE Trans Speech and Audio Processing 7(5):554–568
Xu L, Krzyzak A, Suen CY (1992) “Methods of combining multiple classifiers and their application to handwritten numeral recognition”. IEEE Trans Systems, Man and Cybern 22(3):418–435
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
De Santo, M., Percannella, G., Sansone, C. et al. Segmentation of news videos based on audio-video information. Pattern Anal Applic 10, 135–145 (2007). https://doi.org/10.1007/s10044-006-0055-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-006-0055-5