Skip to main content
Log in

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

  • Original Research
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.cctv.com.

  2. http://marsyas.sness.net/.

References

  1. Androutsos, D., Guan, L., Venetsanopoulos, A.N.: Semantic retrieval of multimedia. IEEE Signal Process. Mag. 14, 237–253 (2006)

    Google Scholar 

  2. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  3. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  4. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. LNCS, vol. 5768, pp. 894–C903 (2009)

  5. Carey, M.J., Parris, E.S., Lloyd-Thomas, H.: A comparison of features for speech, music discrimination. In: ICASSP, vol. 1, pp. 149–152. Phoenix, USA (1999)

  6. Chen, L., Gunduz, S., Ozsu, M.T.: Mixed type audio classification with support vector machine. In: International Conference on Multimedia and Expo, pp. 781–784. Toronto, Canada (2006)

  7. Cheong, S., Oh, S.H., Lee, S.Y.: Support vector machines with binary tree architecture for multi-class classification. Neural Inf. Process. 2(3), 47–51 (2004)

    Google Scholar 

  8. Childers, D.G., Skinner, D.P., Kemerait, R.C.: The cepstrum: a guide to processing. Proc. IEEE 65(10), 1428–1443 (1977)

    Article  Google Scholar 

  9. Choi, M.Y., Song, H.J., Kim, H.S.: Discrimination for robust speech recognition in robots. In: International Symposium on Robot and Human Interactive Communication, vol. 1, pp. 118–121. Jeju, Korea (2007)

  10. Cortes, C., Vapnik, V.: Support network vectors. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  11. Feng, W., Jia, J., Liu, Z.Q.: Self-validated labeling of Markov random fields for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2010)

  12. Gerhard, D.: Pitch extraction and fundamental frequency: History and current techniques. Tech. rep., University of Regina (2003)

  13. Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighborhood component analysis. Adv. Neural Inf. Process. Syst. 17, 513–520 (2005)

    Google Scholar 

  14. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  15. Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 18(6), 607–616 (1996)

    Google Scholar 

  16. Jiang, H., Bai, J., Zhang, S., Xu, B.: Svm-based audio scene classification. In: NLP-KE, vol. 131–136, pp. 897–900 (2005)

  17. Keum, J.S., Lee, H.S.: Speech/music discrimination using spectral peak feature for speaker indexing. In: International Symposium on Intelligent Signal Processing and Communication Systems, pp. 323–326 (2006)

  18. Khan, M.K.S., Al-Khatib, W.G.: Machine-learning based classification of speech and music. Multimedia Syst. 12(1), 55–67 (2006)

    Article  Google Scholar 

  19. Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recognit. Lett. 22, 533–544 (2001)

    Article  MATH  Google Scholar 

  20. Li, Y., Dorai, C.: Svm-based audio classification for instructional video analysis. In: ICASSP, vol. 5, pp. 897–900. Toronto, Canada (2004)

  21. Liu, C., Xie, L., Meng, H.: Classification of music and speech in mandarin news broadcasts. In: National Conference on Man–Machine Speech Communication. Huangshan, China (2007)

  22. Lu, L., Zhang, H.J.: Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10(7), 504–516 (2002)

    Article  Google Scholar 

  23. Lu, L., Zhang, H.J., Li, Z.: Content-based audio classification and segmentation by using support vector machines. Multimedia Syst. 8, 482–491 (2003)

    Article  Google Scholar 

  24. Mckinney, M., Breebaart, J.: Features for audio and music classification. In: Proceedings of the International Symposium on Music Information Retrieval, pp. 151–158 (2003)

  25. Panagiotakis, C., Tziritaz, G.: A speech/music discriminator based on rms and zero-crossings. IEEE Trans. Multimedia 7(1), 155–166 (2005)

    Article  Google Scholar 

  26. Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: A speech/music discriminator of radio recordings based on dynamic programming and bayesian networks. IEEE Trans. Multimedia 10(5), 846–857 (2008)

    Article  Google Scholar 

  27. Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: ICASSP, vol. 2, pp. 1331–1334 (1997)

  28. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  29. Wang, J., Wu, Q., Deng, H., Yan, Q.: Real-time speech/music classification with a hierarchical oblique decision tree. In: ICASSP, pp. 2033–2036 (2008)

  30. Wang, W.Q., Gao, W., Ying, D.W.: A fast and robust speech/music discrimination approach. Inf. Commun. Signal Process. 3, 1325–1329 (2003)

    Google Scholar 

  31. Weston, J., Watkins, C.: Multi-class support vector machines. Tech. Rep. CSD-TR-98-04, University of London, Egham, UK (1998)

  32. Wu, Q., Yan, Q., Deng, H., Wang, J.: A combination of data mining method with decision trees building for speech/music discrimination. Comput. Speech Lang. 24(7), 257–272 (2010)

    Article  Google Scholar 

  33. Xie, L.: Discovering salient prosodic cues and their interactions for automatic story segmenation in Mandarin broadcast news. Multimedia Syst. 14, 237–253 (2008)

    Article  Google Scholar 

  34. Xie, L., Wang, G.: A two-stage multi-feature integration approach to unsupervised speaker change detection in real-time news broadcasting. In: International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 350–353 (2008)

  35. Zhang, T., Jay Kuo, C.C.: Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001)

    Article  Google Scholar 

  36. Zheng, L., Xie, L., Wang, X., Lu, M., Yang, Y., Zhang, Y.: An antomatic caption generator for mandarin broadcast news. In: 5th Joint Conference on Harmonious Human Machine Environment. Xi’an, China (2009)

  37. Zhu, Y., Sun, Q., Rahardja, S.: Detecting musical sounds in broadcast audio based on pitch tuning analysis. In: International Conference on Multimedia and Expo, pp. 13–16. Toronto, Canada (2006)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (60802085), the Program for New Century Excellent Talents in University (2008) supported by the Ministry of Education (MOE) of China, the Research Fund for the Doctoral Program of Higher Education in China (20070699015), the Natural Science Basic Research Plan of Shaanxi Province (2007F15) and the NPU Foundation for Fundamental Research (W018103).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Xie.

Additional information

Communicated by T. Haenselmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, L., Fu, ZH., Feng, W. et al. Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news. Multimedia Systems 17, 101–112 (2011). https://doi.org/10.1007/s00530-010-0205-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-010-0205-x

Keywords

Navigation