Abstract
This paper presents a system that detects the two basic components (speech and music) in the context of radio broadcast indexing. The originality of the approach covers three different points: a differentiated modelling based on Gaussian Mixture Model (GMM), which permits the extraction of speech and music components separately, the normalization of commonly used features and the efficient fusion of classifiers for speech classification which provides a substantial improvement in the presence of strong background music: accuracy of the indexing system goes from [69.2%,94.2%] for the best classifier to [90.25%,98.56%] for the fusion. Evaluation was performed on 12 hours of radio broadcast recorded under various noise conditions, channels and containing diverse speech and music mixtures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
El-Maleh, K., Klein, M., Petrucci, G., Kabal, P., McGill, P.: Speech/music discrimination for multimedia applications. In: Proc. IEEE ICASSP (2000)
Meinedo, H., Neto, J.: Audio segmentation, classification and clustering in a broadcast news task. In: Proc. IEEE ICASSP (2003)
Rossignol, S., Rodet, X., Soumagne, J., Collette, J., Depalle, P.: Automatic characterization of musical feature extraction and temporal segmentation. Journal of New Music Research 28 (1999)
Pinquier, J., Rouas, J., André-Obrecht, R.: Robust speech/music classification in audio documents. In: Proc. IEEE ICASSP (2003)
Ghaemmaghami, S.: Audio segmentation and classification based on a selective analysis scheme. In: Proc. IEEE MMC (2004)
Razik, J., Sénac, C., Fohr, D., Mella, O., Parlangeau, N.: Comparison of two speech/music segmentations systems for audio indexing on the web. In: Proc. SCI (2003)
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Proc. of ‘A Speaker Odyssey’ (2001)
Faifhurst, M., Rahman, A.: Enhancing consensus in multiple expert decision fusion. IEEE VISPÂ 147 (2000)
Kwon, O., Lee, T.: Optimizing speech/non-speech classifier design using adaboost. In: Proc. IEEE ICASSP (2003)
Lu, L., Li, S., Zhang, H.: Content based audio segmentation using support vector machines. In: Proc. IEEE ICME (2001)
Ross, A., Jain, A.: Information fusion in biometrics. Pattern Recognition Letters 24 (2003)
Verlinde, P., Chollet, G.: Comparing decision fusion paradigms using k-nn based classifiers, decision trees and logistic regression in a multi-modal identity verification application. In: Proc. AVPA (1999)
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication 33 (2000)
Petralos, M., Benediktsson, J.: The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion. IEEE Trans. on Geosciences and Remote Sensing 39 (2001)
Fleiss, J.: Statistical methods for rates and proportions, 2nd edn. Wiley, Chichester (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sénac, C., Ambikairajh, E. (2004). Audio Classification for Radio Broadcast Indexing: Feature Normalization and Multiple Classifiers Decision. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds) Advances in Multimedia Information Processing - PCM 2004. PCM 2004. Lecture Notes in Computer Science, vol 3332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30542-2_109
Download citation
DOI: https://doi.org/10.1007/978-3-540-30542-2_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23977-2
Online ISBN: 978-3-540-30542-2
eBook Packages: Computer ScienceComputer Science (R0)