Preliminary Results of Alignment of Text and Audio in News and Songs

Córdova Lucero, Darwin Patricio; Toledano, Doroteo Torre

doi:10.1007/978-3-642-35292-8_7

Darwin Patricio Córdova Lucero⁷ &
Doroteo Torre Toledano⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

714 Accesses
1 Citations

Abstract

This paper addresses the problem of forced alignment in news and songs in order to get the times where every word of the transcriptions begins and ends. For this purpose two methods are used. The first one is basically a forced alignment process of the audio and text based on pre-existent models. The second one is a model-free method in which new models are trained on the audio to align producing as a result the aligned text and audio. For analysis of the songs, we have considered two versions of the same song: one is an a capella song (only voice with no music) and the other, the full song (with instrumental music included). Three songs have been selected from different singers and different styles. Regarding news, we have analyzed four speakers (2 females and 2 males). Analyzing all the results, we observe that news is better aligned than songs, as expected. The two methods work similarly in both a capella songs and news, but in the case of songs that include the instrumental part, the model-free method is much better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mesaros, A., Virtanen, T.: Automatic Alignment of Music Audio and Lyrics. In: Proc. of the 11th Int. Conference on Digital Audio Effects (DAFx 2008), Espoo, Finland, September 1-4 (2008)
Google Scholar
Lee, K., Cremer, M.: Segmentation-Based Lyrics-Audio Alignment Using Dynamic Programming. In: Proc. ISMIR, pp. 395–400 (2008)
Google Scholar
Fujihara, H., Goto, M., Ogata, J., Komatani, K., Ogata, T., Okuno, H.G.: Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. In: Proceedings of the Eighth IEEE International Symposium on Multimedia, ISM 2006 (2006)
Google Scholar
Meinedo, H., Abad, A., Pellegrini, T., Neto, J., Trancoso, I.: The L2F Broadcast News Speech Recognition System. In: Proc. FALA 2010: VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop, pp. 93–96 (2010)
Google Scholar
Ortega, A., Garcia, J., Miguel, A., Lleida, E.: Real-time live broadcast news subtitling system for spanish. In: Proc. Interspeech 2009, Brighton (September 2009)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Povey, D., Valtchev, V., Woodland, P.: The HTK Book, Version 3.4 (March 2009)
Google Scholar
TIMIT Acoustic-Phonetic Continuous Speech Corpus, LDC Catalog Number LDC93S1, Available through the Linguistic Data Consortium, http://www.ldc.upenn.edu
CMU Pronouncing Dictionary, ftp://ftp.cs.cmu.edu/project/speech/dict/ (accessed June 25, 2012)
Toledano, D.T., Hernández, L.A., Villarubia Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6) (November 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

ATVS, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain
Darwin Patricio Córdova Lucero & Doroteo Torre Toledano

Authors

Darwin Patricio Córdova Lucero
View author publications
You can also search for this author in PubMed Google Scholar
Doroteo Torre Toledano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid. C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Doroteo Torre Toledano
Centro Politécnico Superior, Edificio Ada Byron, C/ María de Luna nº 1, 50018, Zaragoza, Spain
Alfonso Ortega Giménez
Universidade de Aveiro, Campus Universitário Aveiro, 3810-193, Aveiro, Portugal
António Teixeira
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Joaquín González Rodríguez
E.T.S.I.Telecomunicacion, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Hernández Gómez & Rubén San Segundo Hernández &
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Daniel Ramos Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Córdova Lucero, D.P., Toledano, D.T. (2012). Preliminary Results of Alignment of Text and Audio in News and Songs. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-35292-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics