Abstract
A multi-modal method to improve the performance of the anchorperson shot detection for news story segmentation is proposed in this paper. The anchorperson voice information is used for the verification of anchorperson shot candidates extracted by visual information. The algorithm starts with the anchorperson voice shot candidate extraction using time and silence condition. The anchorperson templates are generated from the anchorperson face and cloth information from the anchorperson voice shots extracted. The anchorperson voice models are then created after segregating anchorperson voice shots containing 2 or more voices. The anchorperson voice model verifies the anchorperson shot candidates obtained from visual information. 720 minutes of news programs are tested and experimental results are demonstrated.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhang, H., Gong, Y., Smoliar, S.W., Tan, S.Y.: Automatic parsing of news video. In: Proceedings of the International Conference on Multimedia Computing and Systems, pp. 45–54 (1994)
Hanjalic, A., Lagensijk, R.L., Biemond, J.: Template-based Detection of Anchorperson Shots in News Program. In: Proceedings of 1998 International Conference on Image Processing, ICIP 1998, vol. 3, pp. 148–152 (1998)
Choi, J., Jeong, D.: Storyboard construction using segmentation of MPEG encoded news video. In: Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems, vol. 2, pp. 758–761 (2000)
Bertini, M., Del Bimbo, A., Pala, P.: Content based indexing and retrieval of TV news. Pattern Recognition Letter 22, 503–516 (2001)
Gao, X., Li, J., Yang, B.: A Graph-Theoretical Clustering based Anchorperson Shot Detection for news Video Indexing. In: ICCIMA 2003 (2003)
Nakajima, Y., Yamguchi, D., Kato, H., Yanagihara, H., Hatori, Y.: Automatic anchorperson detection from an MPEG coded TV program. In: International Conference on Consumer Electronics, ICCE 2002. Digest of Technical Papers, pp. 122–123 (2002)
Irii, H., Itoh, K., Kitawaki, N.: Multi-lingual speech database for speech quality measurements and its statistic characteristic. Trans. Committee on Speech Research, Acoust. Soc. Jap., S87-69 (1987)
Furui, S.: Digital Speech Processing, Synthesis, and Recognition. Marcel Dekker, New York (1989)
Li, S.Z., Zhu, L., Zhang, Z., Zhang, H.: Learning to detect multi-view faces in real-time, Development and Learning. In: The 2nd International Conference on Proceedings, pp. 172–177 (2002)
Guo, G.-D., Zhang, H.-J., Li, S.Z.: Pairwise face recognition. In: ICCV 2001, vol. 2, pp. 282–287 (2001)
Land, E.H., McCann, J.J.: Lightness and retinex theory. Journal of the Optical Society of America 61, 1–11 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, SK., Hwang, D.S., Kim, JY., Seo, YS. (2005). An Effective News Anchorperson Shot Detection Method Based on Adaptive Audio/Visual Model Generation. In: Leow, WK., Lew, M.S., Chua, TS., Ma, WY., Chaisorn, L., Bakker, E.M. (eds) Image and Video Retrieval. CIVR 2005. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11526346_31
Download citation
DOI: https://doi.org/10.1007/11526346_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27858-0
Online ISBN: 978-3-540-31678-7
eBook Packages: Computer ScienceComputer Science (R0)