Paper
23 December 1999 Adaptive anchor detection using online trained audio/visual model
Zhu Liu, Qian Huang
Author Affiliations +
Proceedings Volume 3972, Storage and Retrieval for Media Databases 2000; (1999) https://doi.org/10.1117/12.373545
Event: Electronic Imaging, 2000, San Jose, CA, United States
Abstract
An anchor person is the hosting character in broadcast programs. Anchor segments in video often provide the landmarks for detecting the content boundaries so that it is important to identify such segments during automatic content-based multimedia indexing. Previous efforts are mostly focused on audio information or visual information alone for anchor detection using either model based methods via off-line trained models or unsupervised clustering methods. The inflexibility of the off-line model based approach and the increasing difficulty in achieving detection reliability using clustering approach lead to a new approach proposed in this paper. The goal is to detect an arbitrary anchor in a given broadcast news program. The proposed approach exploits both audio and visual cues so that on-line acoustic and visual models for the anchor can be built dynamically during data processing. In addition to the capability of identifying any given anchor, the proposed method can also be used to enhance the performance by combining with the algorithm that detects a predefined anchor. Preliminary experiment result are shown and discussed. It is demonstrated that this proposed new approach enables the flexibility of detecting an arbitrary anchor without losing the performance.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhu Liu and Qian Huang "Adaptive anchor detection using online trained audio/visual model", Proc. SPIE 3972, Storage and Retrieval for Media Databases 2000, (23 December 1999); https://doi.org/10.1117/12.373545
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Visual process modeling

Data modeling

Facial recognition systems

Acoustics

Skin

Feature extraction

RELATED CONTENT

Automatic lip reading by using multimodal visual features
Proceedings of SPIE (February 03 2014)
Video description combining visual and audio features
Proceedings of SPIE (April 14 2023)
Attention based CNN-LSTM network for video caption
Proceedings of SPIE (November 10 2022)
2+2=5: painting by numbers
Proceedings of SPIE (January 16 2006)

Back to Top