Skip to main content

Preliminary Results of Alignment of Text and Audio in News and Songs

  • Conference paper
Advances in Speech and Language Technologies for Iberian Languages

Abstract

This paper addresses the problem of forced alignment in news and songs in order to get the times where every word of the transcriptions begins and ends. For this purpose two methods are used. The first one is basically a forced alignment process of the audio and text based on pre-existent models. The second one is a model-free method in which new models are trained on the audio to align producing as a result the aligned text and audio. For analysis of the songs, we have considered two versions of the same song: one is an a capella song (only voice with no music) and the other, the full song (with instrumental music included). Three songs have been selected from different singers and different styles. Regarding news, we have analyzed four speakers (2 females and 2 males). Analyzing all the results, we observe that news is better aligned than songs, as expected. The two methods work similarly in both a capella songs and news, but in the case of songs that include the instrumental part, the model-free method is much better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mesaros, A., Virtanen, T.: Automatic Alignment of Music Audio and Lyrics. In: Proc. of the 11th Int. Conference on Digital Audio Effects (DAFx 2008), Espoo, Finland, September 1-4 (2008)

    Google Scholar 

  2. Lee, K., Cremer, M.: Segmentation-Based Lyrics-Audio Alignment Using Dynamic Programming. In: Proc. ISMIR, pp. 395–400 (2008)

    Google Scholar 

  3. Fujihara, H., Goto, M., Ogata, J., Komatani, K., Ogata, T., Okuno, H.G.: Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. In: Proceedings of the Eighth IEEE International Symposium on Multimedia, ISM 2006 (2006)

    Google Scholar 

  4. Meinedo, H., Abad, A., Pellegrini, T., Neto, J., Trancoso, I.: The L2F Broadcast News Speech Recognition System. In: Proc. FALA 2010: VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop, pp. 93–96 (2010)

    Google Scholar 

  5. Ortega, A., Garcia, J., Miguel, A., Lleida, E.: Real-time live broadcast news subtitling system for spanish. In: Proc. Interspeech 2009, Brighton (September 2009)

    Google Scholar 

  6. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Povey, D., Valtchev, V., Woodland, P.: The HTK Book, Version 3.4 (March 2009)

    Google Scholar 

  7. TIMIT Acoustic-Phonetic Continuous Speech Corpus, LDC Catalog Number LDC93S1, Available through the Linguistic Data Consortium, http://www.ldc.upenn.edu

  8. CMU Pronouncing Dictionary, ftp://ftp.cs.cmu.edu/project/speech/dict/ (accessed June 25, 2012)

  9. Toledano, D.T., Hernández, L.A., Villarubia Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6) (November 2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Córdova Lucero, D.P., Toledano, D.T. (2012). Preliminary Results of Alignment of Text and Audio in News and Songs. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35292-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35291-1

  • Online ISBN: 978-3-642-35292-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics