Skip to main content

Advertisement

Log in

A tool for automatic transcription of intonation: Eti_ToBI a ToBI transcriber for Spanish and Catalan

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This article presents Eti_ToBI, a tool that automatically labels intonational events in Spanish and Catalan utterances according to the Sp_ToBI and Cat_ToBI current conventions. The system consists in a Praat script that assigns ToBI labels to pitch movements basing the assignments on lexical data introduced by the researcher and the acoustical data that it extracts from sound files. The first part of the article explains the methodological approach that has made possible the automatisation and describes the algorithms used by the script to perform the analysis. The second part presents the reliability results for both Catalan and Spanish corpora showing a level of agreement equal to the one shown by human transcribers among them in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Despite having the same number, the deep pitch accents for Catalan and Spanish differ. The tritonal accent L + H* + L never appears in Catalan, meanwhile ¡H + L* does not in Spanish.

  2. The tritonal pitch accent L + H* + L has been attested in Argentinian Spanish.

  3. Despite being theoretically universal, the script is optimised for the recognition of the movements of Spanish and Catalan systems since the alignment of pitch target varies from language to language.

  4. In order to accomplish the objective of labelling the perceivable F0 movements, we follow the convention of a threshold of 1.5 semitones for a movement to be considered significant. The 1.5 semitones threshold has proved as the effective operative threshold for the perception of intonation for Spanish (Pamies et al. 2002) and also to other intonational languages (Rietveld and Gussenhoven 1985). The 1.5 threshold is also adopted in Roseano and Fernández Planas (2013).

  5. In the case that the pitch accent is not the first in the IP, and it has been a high target before, the script will label that accent as high if there has not been declination (i.e., a falling greater than 1,5 semitones) since the last target.

  6. The label will be L* + H if the researcher has specified in the form that he/she does not want that label in the transcription.

  7. The 6 semitone threshold has been recently suggested for Spanish and Catalan basing on the perception of phonological contrasts between a high and an extra high levels (Borràs-Comes et al. 2014; Vanrell 2011).

  8. The boundary tone L¡H % is not included in current ToBI systems. However, the script is able to contrast between two nuclear configurations only by having this label. The extrahigh tone contrasts in the script with a LH % that has been usually identified with LM % (Vanrell 2011). Of course, in the standardized tier L + H* LH % becomes L + H* L!H % and L + H* L¡H % becomes L + H* LH % as the conventions state.

  9. The script gives the possibility, in the initial form, to choose if the tritonal accent of Argentinian Spanish has to be used in the second tier or not.

  10. Declination is pitch natural tendency to decline from the beginning of an intonational phrase to the end.

  11. This conversion from the first tier H + L* to the second L* (Fig. 2) is possible in Spanish and Catalan because real H + L* tones consist of a fall within the accented syllable (Estebas-Vilaplana and Prieto 2010; Prieto 2014). For the script, this implies that a phonological H + L* must have a fall greater than 1.5 semitones within the start point and the end point of the stressed syllable.

  12. In Spanish and Catalan intonational phonology the mid-level is not very productive, actually, it is reduced to the transcription of nuclear configurations of the vocative chant (L + H* !H %) and the emphatic obviousness statement (L + H* L!H %). These configurations have specific durations, and, in the case of the vocative chant, many other characteristics that make it recognisable (Ladd 2008: 135), that made that some other parameters such as duration were included in order to help to recognise the contours.

  13. A recent work proves that they are phonologically different but the change has not been integrated yet (Roseano, Fernández Planas, Elvira-García, and Martínez Celdrán 2015).

  14. Algherese Catalan has a H* + L L % nuclear configuration.

References

  • Alessandro, C., & Mertens, P. (1995). Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language, 9(3), 257–288.

    Article  Google Scholar 

  • Beckman, M., Díaz-Campos, M., McGory, J. T., & Morgan, T. A. (2002). Intonation across Spanish, in the tones and break indices framework. Probus, 14, 9–36. doi:10.1515/prbs.2002.008.

    Article  Google Scholar 

  • Beckman, M., & Elam, G. A. (1997). Guidelines for ToBI Labelling. The Ohio State University Research Foundation.

  • Black, A. W., & Hunt, A. J. (1996). Generating F 0 contours from ToBI labels using linear regression. In ICSLP 96. Fourth International Conference on Spoken Language Proceedings (pp. 1385–1388). Philadelphia: IEEE. doi:10.1109/ICSLP.1996.607872.

  • Blum-Kulka, S. (1982). Learning to Say What You Mean in a Second Language: A Study of the Speech Act Performance of Learners of Hebrew as a Second Language1. Applied Linguistics, 3(1), 29–59. http://applij.oxfordjournals.org/content/III/1/29.short. Accessed January 21 2015.

  • Boersma, P. (1993). Acurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In IFA Proceedings 17 (pp. 97–110). http://www.fon.hum.uva.nl/paul/papers/Proceedings_1993.pdf.

  • Boersma, P., & Weenink, D. (2015). Praat: doing phonetics by computer. http://www.praat.org/.

  • Borràs-Comes, J., Vanrell, M. del M., & Prieto, P. (2014). The role of pitch range in establishing intonational contrasts. Journal of the International Phonetic Association, 44(01), 1–20. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9212002&fileId=S0025100313000303. Accessed April 7 2014.

  • Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8(2), 277–312. http://www.isca-speech.org/archive_open/int_97/inta_259.html. Accessed November 17 2014.

  • Campbell, N. (1996). Autolabelling Japanese ToBI. In ICSLP 96. Fourth International Congress on Conference on Language Processing Proceedings (Vol. 4, pp. 2399 – 2402). Philadelphia: IEEE. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=607292. Accessed September 3 2014.

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. http://psycnet.apa.org/journals/bul/70/4/213/. Accessed July 18 2014.

  • Cohen, M. A., Grossberg, S., & Wyse, L. L. (1995). A spectral network model of pitch perception. The Journal of the Acoustical Society of America, 98(2 Pt 1), 862–79. http://www.ncbi.nlm.nih.gov/pubmed/7642825. Accessed July 1 2015.

  • De Looze, C. (2010). Analyse et interprétation de l’empan temporel des variations prosodiques en français et en anglais. Aix-en-Provence. Retrieved from http://halshs.archives-ouvertes.fr/tel-00470641/.

  • Dorta, J. (Ed.). (2013). Estudio comparativo preliminar de la entonación de Canarias, Cuba y Venezuela. Madrid-Sta Cruz de Tenerife: La Página ediciones.

    Google Scholar 

  • Escudero, D., Aguilar, L., Vanrell, M. del M., & Prieto, P. (2012). Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system. Speech Communication, 54(4), 566–582. http://www.sciencedirect.com/science/article/pii/S0167639311001749. Accessed April 7 2014.

  • Escudero-Mancebo, D., González-Ferreras, C., Vivaracho-Pascual, C., & Cardeñoso-Payo, V. (2014). A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling. In Computer Speech & Language (Vol. 28, pp. 326–341). doi:10.1016/j.csl.2013.08.001.

  • Estebas-Vilaplana, E., & Prieto, P. (2010). Castilian Spanish intonation (pp. 17–48). Lincom Europa, München: Transcription of Intonation of the Spanish Language.

    Google Scholar 

  • Face, T., & Prieto, P. (2007). Rising accents in Castilian Spanish: a revision of Sp-ToBI. Journal of Portuguese Linguistics, 6(1), 117.

    Article  Google Scholar 

  • Fernández Planas, A. M., & Martínez Celdrán, E. (2003). El tono fundamental y la duración: dos aspectos de la taxonomía prosódica en dos modalidades de habla (enunciativa e interrogativa) del español. Estudios de fonética experimental, 12, 166–200. http://www.raco.cat/index.php/EFE/article/viewArticle/140007/0. Accessed April 7 2014.

  • Fernández Planas, A. M., Martínez Celdrán, E., Salcioli Guidi, V., Toledo, G., & Castellví Vives, J. (2002). Taxonomía autosegmental en la entonación del español peninsular. In Actas del II Congreso de Fonética Experimental (pp. 180–186). Sevilla.

  • Fleiss, J. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. doi:10.1037/h0031619.

    Article  Google Scholar 

  • Frid, J. (1999). An environment for testing prosodic and phonetic transcriptions. In Proceedings of ICPhS 99 (pp. 2319–2322). San Francisco. http://lup.lub.lu.se/record/529087/file/1624474.pdf. Accessed September 3 2014.

  • Garrido Almiñana, J. M. (2008, April 28). Modelling Spanish Intonation for Text-to-Speech Applications. Universitat Autònoma de Barcelona. http://www.tdx.cat/handle/10803/4885. Accessed July 3 2014.

  • GraphPad. (2014). QuickCalcs. http://graphpad.com/quickcalcs/kappa1/. Accessed January 6 2014.

  • Hart, J. t’, & Collier, R. (1975). Integrating Different Levels of Intonation Analysis. Journal of Phonetics, 3(4), 235–255. http://eric.ed.gov/?id=EJ127873. Accessed September 2 2014.

  • Hermes, D. (1988). Measurement of pitch by subharmonic summation. The journal of the acoustical society of America, 83(1), 257–264. http://scitation.aip.org/content/asa/journal/jasa/83/1/10.1121/1.396427. Accessed July 16 2015.

  • Hirst, D. (2011). The analysis by synthesis of speech melody: from data to models. Journal of Speech Sciences, 1(1), 55–83. http://www.journalofspeechsciences.org/index.php/journalofspeechsciences/article/viewArticle/21.

  • Hirst, D., Di Cristo, A., & Espesser, R. (2000). Levels of representation and levels of analysis for the description of intonation systems. Prosody: theory and experiment (pp. 51–88). Dordrecht: Kluwer.

    Chapter  Google Scholar 

  • Hirst, D., & Espesser, R. (1993). Automatic Modelling of Fundamental Frequency Using a Quadratic Spline Function. Travaux de l’Institut de Phonétique d’Aix-en-Provence, 75–85.

  • Hualde, J. I. (2003). El modelo métrico y autosegmental. In P. Prieto (Ed.), Teorías de la entonación (pp. 155–181). Barcelona: Ariel.

    Google Scholar 

  • Jeng, F., Hu, J., Dickman, B., & Lin, C. (2011). Evaluation of two algorithms for detecting human frequency-following responses to voice pitch. International Journal of audiology, 50(1), 14–26. http://www.tandfonline.com/doi/abs/10.3109/14992027.2010.515620. Accessed September 16 2015.

  • Jun, S.-A., Lee, S., Kim, K., & Lee, Y. (2010). Labeler agreement in transcribing korean intonation with K-ToBI. In Interspeech’10 (pp. 211–214). http://www.linguistics.ucla.edu/people/jun/ICSLP-KtobiAgree.pdf. Accessed December 6 2014.

  • Kim, B., Lee, J., & Lee, G. (2002). Corpus-based Pitch Prediction based on K-ToBI Representation. In ACM Transactions on Asian Language Information Processing (TALIP) (Vol. 1, pp. 207–224). ACM New York, NY, USA. doi:10.1145/772755.772757.

  • Kotnik, B., Höge, H., & Kačič, Z. (2009). Noise robust F0 determination and epoch-marking algorithms. Signal Processing, 89(12), 2555–2569. doi:10.1016/j.sigpro.2009.04.017.

    Article  Google Scholar 

  • Ladd, D. R. (2008). Intonational phonology Cambridge (2nd ed., Vol. 2). New York: Cambridge University Press.

    Book  Google Scholar 

  • Lea, W. (1980). Prosodic aids to speech recognition. In W. Lea (Ed.), Trends in Speech Recognition (pp. 166–205). Englewood: Prentice-Hall.

    Google Scholar 

  • Lee, J., Kim, B., & Lee, G. (2002). Automatic corpus-based tone and break-index prediction using K-ToBI representation. ACM Transactions on Asian Language Information Processing (TALIP), 1(3), 207–224. doi:10.1145/772755.772757.

    Article  Google Scholar 

  • Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000). Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. In Proceedings of Acoustics, Speech, and Signal Processing, ICASSP 2000 (pp. 1025–1028). Washington, DC.

  • Martínez Celdrán, E., & Fernández Planas, A. M. (2003). Taxonomía de las estructuras entonativas de las modalidades declarativa e interrogativa del español estándar peninsular estándar según el modelo AM en habla de laboratorio. In E. Herrera & P. Martín (Eds.), La tonía: dimensiones fonéticas y fonológicas (pp. 267–294). México D.F.: El Colegio de México.

    Google Scholar 

  • Noguchi, H., & Kiriyama, K. (1999). Automatic labeling of Japanese prosody using J-ToBI style description. In EUROSPEECH’99. Sixth European Conference on Speech Communication and Technology (pp. 2259–2262). http://20.210-193-52.unknown.qala.com.sg/archive/archive_papers/eurospeech_1999/e99_2259.pdf. Accessed September 3 2014.

  • Nolan, F., & Grabe, E. (1997). Can “ToBI” Transcribe Intonational Variation in British English? In Intonation: Theory, Models and Applications (pp. 259–262). Athens, Greece. http://www.isca-speech.org/archive_open/int_97/inta_259.html. Accessed November 17 2014.

  • Pamies, A., Fernández Planas, A. M., Martínez Celdrán, E., Ortega-Escandell, A., & Amorós Cespedes, M. C. (2002). Umbrales tonales en español peninsular. In Actas del II Congreso de Fonética Experimental (Vol. Sevilla, pp. 272–278).

  • Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Cambridge, Massachusetts: MIT.

    Google Scholar 

  • Pierrehumbert, J. (1983). Automatic recognition of intonation patterns. In Proceedings of the 21st annual meeting on Association for Computational Linguistics (pp. 85–90). http://dl.acm.org/citation.cfm?id=981328. Accessed December 1 2014.

  • Pierrehumbert, J. (2000). The phonetic grounding of phonology. Bulletin de la communication parlée, 5, 7–23.

    Google Scholar 

  • Pierrehumbert, J., Beckman, M. E., & Ladd, D. R. (2000). Conceptual foundations of phonology as a laboratory science (pp. 273–304). Phonological knowledge: Conceptual and empirical issues.

    Google Scholar 

  • Pitrelli, J. F., Beckman, M. E., & Hirschberg, J. (1994). Evaluation of prosodic transcription labeling reliability in the tobi framework. ICSLP. http://20.210-193-52.unknown.qala.com.sg/archive/archive_papers/icslp_1994/i94_0123.pdf. Accessed July 13 2014.

  • Prieto, P. (2009). Tonal alignment patterns in Catalan nuclear falls. Lingua, 119(6), 865–880.

    Article  Google Scholar 

  • Prieto, P. (2014). The intonational phonology of Catalan. In S.-A. Jun (Ed.), Prosodic typology (Vol. 2, pp. 43–80). Oxford: Oxford University Press. http://www.elebilab.com/documentos/archivos/publicaciones/3_GGT-08-04.pdf. Accessed August 26 2014.

  • Prieto, P., & Cabré, T. (Eds.). (2013). L’entonació dels dialectes catalans. Rubí: Publicacions de l’Abadia de Montserrat.

    Google Scholar 

  • Prieto, P., & Hualde, J. I. (n.d.). Towards an international phonetic alphabet. Laboratory Phonology. (in press)

  • Prieto, P., & Roseano, P. (Eds.). (2010). Transcription of Intonation of the Spanish Language. München: Lincom Europa.

    Google Scholar 

  • Prieto, P., van Santen, J., & Hirschberg, J. (1995). Tonal alignment patterns in Spanish. Journal of Phonetics, 23(4), 429–451.

    Article  Google Scholar 

  • Randolph, J. J. (2008). Online Kappa Calculator. http://justus.randolph.name/kappa.

  • Rietveld, A. C. M. (1984). Syllaben, klemtonen en de automatische detectie van beklemtoonde syllaben in het Nederlands. Université de Nijmegen.

  • Rietveld, T., & Gussenhoven, C. (1985). On the relation between pitch excursion size and prominence. Journal of Phonetics, 13, 299–308.

    Google Scholar 

  • Roseano, P., & Fernández Planas, A. M. (2013). Transcripció fonètica i fonològica de l’entonació: una proposta d’etiquetatge automàtic. Estudios de fonética experimental, XXII, 275–332. http://www.raco.cat/index.php/EFE/article/view/275413. Accessed July 18 2014.

  • Roseano, P., Fernández Planas, A. M., Elvira-García, W., & Martínez Celdrán, E. (2015). Els tons de continuació en parla espontània: Descripció i transcripció. Barcelona: VII Workshop sobre la prosòdia del català.

    Google Scholar 

  • Rosenberg, A. (2010). AuToBI - a tool for automatic ToBI annotation. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association (pp. 146–149). Mihama, Japan. http://eniac.cs.qc.cuny.edu/andrew/papers/autobi-is10.pdf. Accessed August 26 2014.

  • Roseano, P., Fernández Planas, A. M., Elvira-García, W., Cerdà Massó, R., & Martínez Celdrán, E. (accepted). Caracterització acústica dels accents prenuclears de les interrogatives absolutes i les declaratives neutres en català central. Estudios de Fonética Experimental, XXV.

  • Ross, K., & Ostendorf, M. (1996). Prediction of abstract prosodic labels for speech synthesis. Computer Speech & Language, 10(3), 155–185. http://www.sciencedirect.com/science/article/pii/S0885230896900108. Accessed October 29 2014.

  • Savino, M., Refice, M., & Daleno, D. (2002). Methods and Tools for Prosodic Analysis of a Spoken Italian Corpus. In Proceedings of the I International Conference on Language Resources and Evaluation (pp. 307–312). http://lrec-conf.org/proceedings/lrec2002/pdf/101.pdf. Accessed September 8 2014.

  • Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1), 127–154.

    Article  Google Scholar 

  • Siebenhaar, B., & Leemann, A. (2012). Methodological reflections on the phonetic-phonological continuum, illustrated on the prosody of Swiss German dialects. In A. Ender, A. Leemann, & B. Wälchli (Eds.), Methods in Contemporary Linguistics (Vol. 247, pp. 21–44). Berlin: Walter de Gruyter. http://books.google.es/books?hl=es&lr=&id=cf8YDeYvBuQC&oi=fnd&pg=PA21&dq=This+system+has+been+formalized+in+the+ToBI+transcription+sys-+tem.+…+phonetic–+phonological+continuum,+illustrated+on+the+prosody+of+Swiss+German+dialects&ots=cIfe-1AYbo&sig=M9W96TM_PcPLCC49gwaKEGURcg0. Accessed November 17 2014.

  • Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., et al. (1992). ToBI: A Standard for Labeling English Prosody. In M. M. H. and G. E. W. J. J. Ohala, T. M. Nearey, B. L. Derwing (Ed.), ICSLP 92 Proceedings 1992 International Conference on Spoken Language Processing. Volume 2 (pp. 867–870.). Department of Linguistics, University of Alberta.

  • Sridhar, V. (2008). Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework. IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 797–811. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4453862. Accessed April 7 2014.

  • Syrdal, A. K., Hirschberg, J., McGory, J., & Beckman, M. (2001). Automatic ToBI prediction and alignment to speed manual labeling of prosody. Speech Communication, 33(1), 135–151. http://www.sciencedirect.com/science/article/pii/S016763930000073X. Accessed April 7 2014.

  • Syrdal, A. K., & McGory, J. T. (2000). Inter-transcriber reliability of toBI prosodic labeling. INTERSPEECH, 2000, 235–238.

    Google Scholar 

  • Tatham, M., & Morton, K. (2005). Developments in Speech Synthesis. John Wiley & Sons. http://books.google.com/books?id=6mPk1Dkt_V0C&pgis=1. Accessed November 17 2014.

  • The Ohio State University Department of Linguistics. (1999). ToBI. http://www.ling.ohio-state.edu/~tobi/. Accessed August 9 2014.

  • Tür, G., Hakkani-Tür, D., Stolcke, A., & Shriberg, E. (2001). Integrating prosodic and lexical cues for automatic topic segmentation. Computational Linguistics, 27(1), 31–57.

    Article  Google Scholar 

  • Vanrell, M. del M. (2011). The phonological relevance of tonal scaling in the intonational grammar of Catalan. Universitat Autònoma de Barcelona.

  • Wagner, A. (2008). Automatic labeling of prosody. In Proceedings of the 2nd ISCA Workshop on Experimental Linguistics, ExLing 2008 (pp. 25–27). Athens, Greece. http://isca-speech.org/archive_open/archive_papers/exling2008/exl8_221.pdf. Accessed September 3 2014.

  • Wasserblat, M.., Gainza, M.., Dorran, D.., & Domb, Y.. (2008). Pitch tracking and voiced/unvoiced detection in noisy environment using optim at sequence estimation. In IET Irish Signals and Systems Conference (pp. 43–48). Galway, Ireland.

  • Wightman, C., & Ostendorf, M. (1994). Automatic labeling of prosodic patterns. In IEEE Transactions on Audio, Speech, and Language Processing (Vol. 2, pp. 469–481). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=326607. Accessed November 17 2014.

Download references

Acknowledgments

This work has been funded by a grant awarded by the Spanish government FFI2012-35998 for the AMPER-CAT project and the predoctoral grant APIF-2012 of the University of Barcelona.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wendy Elvira-García.

Appendix

Appendix

See Figs. 9, 10 and 11.

Fig. 9
figure 9

Prenuclear pitch accents detected by the script in the first tier (left) and its equivalencies in the second tier (right). When two choices are given in the second column it indicates that both transcriptions are possible depending on what options and language the researcher has chosen in the form of the script

Fig. 10
figure 10

Schematic representation and labelling of the possible nuclear pitch accents detectable by the script and its equivalence in the second tier

Fig. 11
figure 11

Schematic representation and labelling of the possible boundary tones detectable by the script and its equivalence in the second tier

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elvira-García, W., Roseano, P., Fernández-Planas, A.M. et al. A tool for automatic transcription of intonation: Eti_ToBI a ToBI transcriber for Spanish and Catalan. Lang Resources & Evaluation 50, 767–792 (2016). https://doi.org/10.1007/s10579-015-9320-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-015-9320-9

Keywords

Navigation