Skip to main content
Log in

Audio Partitioning and Transcription for Broadcast Data Indexation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This work addresses automatic transcription of television and radio broadcasts in multiple languages. Transcription of such types of data is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Radio and television broadcasts consist of a continuous data stream made up of segments of different linguistic and acoustic natures, which poses challenges for transcription. Prior to word recognition, the data is partitioned into homogeneous acoustic segments. Non-speech segments are identified and removed, and the speech segments are clustered and labeled according to bandwidth and gender. Word recognition is carried out with a speaker-independent large vocabulary, continuous speech recognizer which makes use of n-gram statistics for language modeling and of continuous density HMMs with Gaussian mixtures for acoustic modeling. This system has consistently obtained top-level performance in DARPA evaluations. Over 500 hours of unpartitioned unrestricted American English broadcast data have been partitioned, transcribed and indexed, with an average word error of about 20%. With current IR technology there is essentially no degradation in information retrieval performance for automatic and manual transcriptions on this data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. S.S. Chen and P.S. Gopalakrishnan, “Environment and channel change detection and clustering via the Bayesian information criterion,” in Proc. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, Virginia, Feb. 1998, pp. 127–132.

  2. J.S. Garofolo, E.M. Voorhees, C.G.P. Auzanne, V.M. Stanford, and B.A. Lund, “Design and preparation of the 1996 Hub-4 broadcast news benchmark test corpora,” in Proc. of the DARPA Speech RecognitionWorkshop, Chantilly, Virginia, Feb. 1997, pp. 15–21. (see also http://www.nist.gov/speech/tests/).

  3. J.S. Garofolo, C.G.P. Auzanne, E.M. Voorhees, and B. Fisher, “The TREC spoken document retrieval track: a success story,” in Proc. 8th Text Retrieval Conference TREC-8, Gaithersburg, Maryland, Nov. 1998, pp. 107–130.

  4. J.L. Gauvain and C.H. Lee, “Maximum a posteriori estimation for multivariate gaussain mixture observation of markov chains, IEEE Trans. on SAP, Vol. 2, No. 2, pp. 291–298, April 1994.

    Google Scholar 

  5. J.L. Gauvain, L. Lamel, G. Adda, and M. Adda-Decker, “The LIMSI Nov93 WSJ system,” in Proc. ARPA Spoken Language Technologies Workshop, Plainsboro, New Jersey, March 1994, pp. 125–128.

  6. J.L. Gauvain, G. Adda, L. Lamel, and M. Adda-Decker, “Transcribing broadcast news: the LIMSI Nov96 Hub4 system,” in Proc. ARPA Speech Recognition Workshop, Chantilly, Virginia, Feb. 1997, pp. 56–63.

  7. J.L. Gauvain, Y. de Kercadio, L. Lamel, and G. Adda, “The LIMSI SDR system for TREC-8,” in Proc. 8th Text Retrieval Conference TREC-8, Gaithersburg, Maryland, Nov. 1999, pp. 475–482.

  8. J.L. Gauvain, L. Lamel, G. Adda, and M. Jardino, “The LIMSI 1998 Hub-4E transcription system,” in Proc. DARPA Broadcast News Workshop, Herndon, Virginia, Feb. 1999, pp. 99–104.

  9. T. Hain, S.E. Johnson, A. Tuerk, P.C. Woodland, and S.J. Young. “Segment generation and clustering in the HTK broadcast news transcription system,” in DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, Virginia, Feb. 1998, pp. 133–137.

  10. D. Hiemstra and K. Wessel, “Twenty-one at TREC-7: ad-hoc and cross-language track,” in Proc. 7th Text Retrieval Conference TREC-7, 227–238, Gaithersburg, Maryland, Nov. 1999.

  11. K.S. Jones, S. Walker, and S.E. Robertson, “A probabilistic model of information retrieval: development and status,” A technical report of the computer laboratory, University of Cambridge, U.K., 1998.

    Google Scholar 

  12. F.M.G. de Jong, J.L. Gauvain, J. den Hartog, and K. Netter, “Olive: speech based video retrieval,” in Proc. CBMI'99, Toulouse, France, Oct. 1999.

  13. C.J. Leggetter and P.C. Woodland,“Maximumlikelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 9, No. 2, pp. 171–185, 1995.

    Google Scholar 

  14. D.R.H. Miller, T. Leek, and R.M. Schwartz, “BBN at TREC7: using hidden markov models for information retrieval,” in Proc. 7th Text Retrieval Conference TREC-7, Gaithersburg, Maryland, Nov. 1999, pp. 133–142.

  15. M.F. Porter, “An Algorithm for Suffix, Stripping,” Program Vol. 14, No. 3, pp. 130–137, 1980.

    Google Scholar 

  16. PSMedia. http://www.thomson.com/psmedia/bnews.html

  17. M. Siegler, U. Jain, B. Raj, and R. Stern, “Automatic segmentation, classification and clustering of broadcast news audio,” in Proc. DARPA Speech Recognition Workshop, Chantilly, Virginia, Feb. 1997, pp. 97–99.

  18. UMass. ftp://ciir-ftp.cs.umass.edu/pub/stemming/

  19. S. Walker and R. de Vere, “Improving subject retrieval in online catalogues: 2. Relevance feedback and query expansion,” British Library Research Paper 72, British Library, London, U.K., 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J.L. Gauvain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gauvain, J., Lamel, L. & Adda, G. Audio Partitioning and Transcription for Broadcast Data Indexation. Multimedia Tools and Applications 14, 187–200 (2001). https://doi.org/10.1023/A:1011303401042

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011303401042

Navigation