Abstract
Pitch tracking is one of the most important research topics in the recognition and identification area. This study concerns the effect of the pitch tracking technique used on the accuracy and speed of automatic dialect identification. This effort was carried out using the TIMIT database. The pitch tracking procedures investigated are the Boersma algorithm, the iterative adaptive inverse filtering approach, and the summation of residual harmonics method. All else being equal, the summation of residual harmonics provided the highest accuracy as well as the fastest performance of the three methods.
Similar content being viewed by others
References
Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2–3), 109–118.
Auvinen, H., Raitio, T., Siltanen, S., Story, B. & Alku, P. (2014). Automatic glottal inverse filtering with Markov chain Monte Carlo method. Computer Speech and Language, 28(5), 1139–1155.
Boersma, P. (2002). Praat: A system for doing phonetics by computer. Glot International, 5, 341–345.
Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam, Vol. 17, pp. 97–110.
Camacho, A. (2007). SWIPE: A Sawtooth Waveform Inspired Pitch estimator. PhD dissertation, University of Florida, pp. 12–20.
Castro, L., & Moraes, J. A. (2008). The temporal structure of professional speaking styles in Brazilian Portuguese. Proceedings of ISCA tutorial and research workshop on experimental linguistics, Athens, pp. 53–56.
Castro, L., Serridge, B., Moraes, J., & Freit, M. (2009). Characterizing variation in fundamental frequency contours of professional speaking styles. Proceedings of Allen Institute for artificial intelligence. http://speechprosody2010.illinois.edu/papers/100440.pdf.
Clopper, C. & Smiljanic, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics, 39(2), 237–245. doi:10.1016/j.wocn.2011.02.006.
Drugman, T. (2011). Advances in glottal analysis and its applications. PhD thesis, University of Mons, Belgium.
Drugman, T., & Alwan, A. (2011). Joint robust voicing detection and pitch estimation based on residual harmonics. Proceedings of Interspeech, Firenze, Italy.
Etman, A. & Beex, A. A. L. (2015). American dialect identification using phonotactic and prosodic features. SAI Intelligent Systems Conference – IntelliSys, UK, pp. 963–970.
Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Department of Computer Science, University of Regina, Canada, pp 1–22.
Murray, I. R. & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion. Journal of Acoustics Society of America, 93(2), 1097–1108.
Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9(7), 727–730.
Talkin, D. (1995). A robust algorithm for pitch tracking.In W. B. Kleijn & K. K. Paliwal (Eds.), Speech coding and synthesis. Amsterdam: Elsevier.
Tamburini, F. (2002). Automatic detection of prosodic prominence in continuous speech, Proceedings of Third International Conference on Language Resources and Evaluation – LREC. Spain, pp. 301–306.
van Santen, J. P. H. (1994). Assignment of segmental duration in text-to-speech synthesis. Computer Speech and Language, 8(2), 95–128.
Wang, M., & Lin, M. (2004). An analysis of Pitch in Chinese spontaneous speech. International Symposium on Tonal Aspects of Tone Languages, Beijing, China.
Acknowledgements
The authors would like to thank the reviewers for the detailed and valuable feedback which helped improve this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Etman, A., Beex, A.A. The effect of pitch tracking on automatic dialect identification. Int J Speech Technol 20, 629–634 (2017). https://doi.org/10.1007/s10772-017-9434-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9434-0