Abstract
In this paper, we present an advanced news video parsing system via exploring the visual characteristics of anchorperson scenes, which aims to provide personalized news services over Internet or mobile platforms. As the anchorperson shots serve as the root shots for constructing news video, the addressed system firstly performs anchorperson detection which divides the news into several segments. Due to the manipulation of multi-features and post-processing, our method of anchorperson detection can even be efficiently applied to news video whose anchorperson scenes are most challenging and complicated. Usually, the segments produced from anchorperson detection are regarded as news stories. However, an observation in our database proves this is not true because of the existing of interview scenes. These interview scenes are showed in the form that interviewer (anchorperson) and interviewee recursively appear. Thus, a technique called interview clustering based on face similarity is carried out to merge these interview segments. Another novel aspect of our system is entity summarization of interview scenes. We adopt it in the system at final. The effectiveness and robustness of the proposed system are demonstrated by the evaluation on 19 hours of news programs from 6 different TV Channels.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Xiong, Z., Zhou, X. S., Tian, Q., Rui, R., & Huang, T. S. (2006). Semantic retrieval of video – review of research on video retrieval in meetings, movies and broadcast news, and sports. IEEE Signal Processing Magazine, 23(2), 18–27.
Wang, Y., Liu, Z., & Huang, J.-C. (2000). Multimedia content analysis using both audio and visual clues. IEEE Signal Processing Magazine, 17(6), 12–36.
Hauptmann, A. G., & Witbrock, m. J. (1998). Story segmentation and detection of commercials in broadcast news video. In Proc. advances in digital libraries conf (pp. 168–179).
Boykin, S., & Merlino, A. (1999). Improving broadcast news segmentation processing. In Proc. IEEE int. conf. multimedia computing and systems (Vol. 1, pp. 744–749).
Liu, Z., Gibbon, D. C., & Shahraray, B. (2006). Multimedia content acquisition and processing in the MIRACLE system. In Proc. IEEE CCNC (pp. 272–276).
Gibbon, D. C., Liu, Z., & Shahraray, B. (2006). The MIRACLE video search engine. In Proc. IEEE CCNC (pp. 277–281).
Ohtsuki, K., Bessho, K., Matsuo, Y., Matsunaga, S., & Hayashi, Y. (2006). Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news. IEEE Signal Processing Magazine, 23(2), 69–78.
Lian, S., & Stavroulakis, P. (2010). Introduction to special issue on secure multimedia services. Telecommunications Systems, 45(1), 1–2.
Lian, S. (2010). Secure service convergence based on scalable media coding. Telecommunications Systems, 45(1), 21–35.
TREC Video Retrieval Evaluation (2003). http://www-nlpir.nist.gov/projects/tv2003/tv2003.html.
TREC video retrieval evaluation (2004). http://www-nlpir.nist.gov/projects/tv2004/tv2004.html.
(1998). Topic detection and tracking evaluation (TDT-2). http://projects.ldc.upenn.edu/TDT2/.
Chua, T., Chang, S.-F., Chaisorn, L., & Hsu, W. (2004). Story boundary detection in large broadcast news video archives – techniques, experience and trends. In Proc. ACM int. conf. multimedia (MM ’04) (pp. 656–659).
Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic detection and tracking pilot study final report. In Proc. DARPA broadcast news transcription and understanding workshop (pp. 194–218).
Chaisorn, L., Chua, T.-S., Koh, C.-K., Zhao, Y.-L., Xu, H., Feng, H., & Tian, Q. (2003). A two-level multi-modal approach for story segmentation of large news video corpus. In Proc. TRECVID workshop.
Rennert, P. (2003). StreamSage unsupervised ASR-based topic segmentation. In Proc. TRECVID workshop.
Sugano, M., Hoashi, K., Mutsumato, K., Sugaya, F., & Nakajima, Y. (2003). Shot boundary determination on MPEG compressed domain and story segmentation experiments for TRECVID 2003. Notebook in TRECVID.
Hsu, W., Chang, S.-F., Huang, C.-W., Kennedy, L., Lin, C.-Y., & Iyengar, G. (2004). Discovery and fusion of salient multi-modal features towards news story segmentation. In IS&T/SPIE electronic imaging, San Jose, CA.
Zhang, H., Gong, Y., Smoliar, S. W., & Tan, S. Y. (1994). Automatic parsing of news video. In Proc. int. conf. multimedia computing and systems (pp. 45–54).
Avrithis, Y., Tsapatsoulis, N., & Kollias, S. (2000). Broadcast news parsing using visual cues: a robust face detection approach. In Proc. IEEE int. conf. multimedia and expo (Vol. 3, pp. 1469–1472).
Smoliar, S. W., & Zhang, H.-J. (1994). Content-based video indexing and retrieval. IEEE Multimedia, 1(2), 62–72.
Lee, H., Yu, J., Im, Y., Gil, J.-M., & Park, D. (2010). A unified scheme of shot boundary detection and anchor shot detection in news video story parsing. Multimedia Tools and Applications, 51(3), 1127–1145.
Gao, X., & Tang, X. (2002). Unsupervised video-shot segmentation and model-free anchorperson detection for news video story parsing. IEEE Transactions on Circuits and Systems for Video Techonology, 12(9), 765–776.
Michener, C. D., & Sokal, R. R. (1957). A quantitative approach to a problem in classification. Evolution, 11, 130–162.
Dong, Y., & Lian, S. (2010). Automatic and fast temporal segmentation for personalized news consuming. Information Systems Frontiers. doi:10.1007/s10796-010-9256-y.
Lian, S. (2011). Automatic video temporal segmentation based on multiple features. Soft Computing, 15(3), 469–482.
Yubin, H., Yuan, D., Chengyu, D., & Haila, W. (2009). A novel audiovisual analysis for news video indexing. In IEEE 2nd international conference on broadband network & multimedia technology (pp. 486–490).
Tan, X., & Triggs, B. (2010). Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing, 19(6), 1635–1650.
Daugman, J. (1988). Complete discrete 2-d Gabor transforms by neural networks for image analysis and compression. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(7), 1169–1179.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2(60), 91–110.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24, 381–395.
Haller, M., Kim, H.-G., & Sikora, T. (2006). Audiovisual anchorperson detection for topic-oriented navigation in broadcast news. In I. Press (Ed.), IEEE 7th international conference on multimedia & expo (ICME 2006) (pp. 1817–1820).
Acknowledgements
This work was supported by both Invenio project launched by France Telecom R&D (Orange Labs), and The Key Project of The National Natural Science Foundation of China (90920001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dong, Y., Qin, G., Xiao, G. et al. Advanced news video parsing via visual characteristics of anchorperson scenes. Telecommun Syst 54, 247–263 (2013). https://doi.org/10.1007/s11235-013-9731-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11235-013-9731-0