Skip to main content

Aligning Very Long Speech Signals to Bilingual Transcriptions of Parliamentary Sessions

  • Conference paper
Book cover Advances in Speech and Language Technologies for Iberian Languages

Abstract

In this paper, we describe and analyse the performance of a simple approach to the alignment of very long speech signals to acoustically inaccurate transcriptions, even when two different languages are employed. The alignment algorithm operates on two phonetic sequences, the first one automatically extracted from the speech signal by means of a phone decoder, and the second one obtained from the reference text by means of a multilingual grapheme-to-phoneme transcriber. The proposed algorithm is compared to a widely known state-of-the-art alignment procedure based on word-level speech recognition. We present alignment accuracy results on two different datasets: (1) the 1997 English Hub4 database; and (2) a set of bilingual (Basque/Spanish) parliamentary sessions. In experiments on the Hub4 dataset, the proposed approach provided only slightly worse alignments than those reported for the state-of-the-art alignment procedure, but at a much lower computational cost and requiring much fewer resources. Moreover, if the resource to be aligned includes speech in two or more languages and speakers conmute between them at any time, applying a speech recognizer becomes unfeasible in practice, whereas our approach can be still applied with very competitive performance at no additional cost.

This work has been supported by the University of the Basque Country, under grant GIU10/18 and project US11/06, by the Government of the Basque Country, under program SAIOTEK (project S-PE11UN065), and the Spanish MICINN, under Plan Nacional de I+D+i (project TIN2009-07446, partially financed by FEDER funds).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vonwiller, J., Cleirigh, C., Garsden, H., Kumpf, K., Mountstephens, R., Rogers, I.: The development and application of an accurate and flexible automatic aligner. The International Journal of Speech Technology 1(2), 151–160 (1997)

    Article  Google Scholar 

  2. Moreno, P., Alberti, C.: A factor automaton approach for the forced alignment of long speech recordings. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4869–4872 (April 2009)

    Google Scholar 

  3. Moreno, P., Joerg, C., Thong, J., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Fifth International Conference on Spoken Language Processing (1998)

    Google Scholar 

  4. Bordel, G., Nieto, S., Penagarikano, M., Rodriguez Fuentes, L.J., Varona, A.: Automatic subtitling of the Basque Parliament plenary sessions videos. In: Proceedings of Interspeech, pp. 1613–1616 (2011)

    Google Scholar 

  5. Bordel, G., Penagarikano, M., Rodriguez Fuentes, L.J., Varona, A.: A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions. In: Interspeech 2012, Portland (OR), USA, September 9-13 (2012)

    Google Scholar 

  6. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium, Philadelphia (1993)

    Google Scholar 

  7. Garofolo, J.S., Graff, D., Paul, D., Pallett, D.S.: CSR-I (WSJ0) Complete. Linguistic Data Consortium, Philadelphia (2007)

    Google Scholar 

  8. Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Marino, J., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proceedings of Eurospeech, Berlin, Germany, September 22-25, pp. 175–178 (1993)

    Google Scholar 

  9. Basque Government, “ADITU program”, Initiative to promote the development of speech technologies for the Basque language (2005)

    Google Scholar 

  10. Weide, R.: The Carnegie Mellon pronouncing dictionary (cmudict.0.6). Carnegie Mellon University (2005)

    Google Scholar 

  11. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)

    Article  Google Scholar 

  12. Hirschberg, D.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bordel, G., Penagarikano, M., Rodríguez-Fuentes, L.J., Fernández, M.A.V. (2012). Aligning Very Long Speech Signals to Bilingual Transcriptions of Parliamentary Sessions. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35292-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35291-1

  • Online ISBN: 978-3-642-35292-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics