Skip to main content
Log in

The effect of pitch tracking on automatic dialect identification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Pitch tracking is one of the most important research topics in the recognition and identification area. This study concerns the effect of the pitch tracking technique used on the accuracy and speed of automatic dialect identification. This effort was carried out using the TIMIT database. The pitch tracking procedures investigated are the Boersma algorithm, the iterative adaptive inverse filtering approach, and the summation of residual harmonics method. All else being equal, the summation of residual harmonics provided the highest accuracy as well as the fastest performance of the three methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2–3), 109–118.

    Article  Google Scholar 

  • Auvinen, H., Raitio, T., Siltanen, S., Story, B. & Alku, P. (2014). Automatic glottal inverse filtering with Markov chain Monte Carlo method. Computer Speech and Language, 28(5), 1139–1155.

    Article  Google Scholar 

  • Boersma, P. (2002). Praat: A system for doing phonetics by computer. Glot International, 5, 341–345.

    Google Scholar 

  • Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam, Vol. 17, pp. 97–110.

  • Camacho, A. (2007). SWIPE: A Sawtooth Waveform Inspired Pitch estimator. PhD dissertation, University of Florida, pp. 12–20.

  • Castro, L., & Moraes, J. A. (2008). The temporal structure of professional speaking styles in Brazilian Portuguese. Proceedings of ISCA tutorial and research workshop on experimental linguistics, Athens, pp. 53–56.

  • Castro, L., Serridge, B., Moraes, J., & Freit, M. (2009). Characterizing variation in fundamental frequency contours of professional speaking styles. Proceedings of Allen Institute for artificial intelligence. http://speechprosody2010.illinois.edu/papers/100440.pdf.

  • Clopper, C. & Smiljanic, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics, 39(2), 237–245. doi:10.1016/j.wocn.2011.02.006.

    Article  Google Scholar 

  • Drugman, T. (2011). Advances in glottal analysis and its applications. PhD thesis, University of Mons, Belgium.

  • Drugman, T., & Alwan, A. (2011). Joint robust voicing detection and pitch estimation based on residual harmonics. Proceedings of Interspeech, Firenze, Italy.

  • Etman, A. & Beex, A. A. L. (2015). American dialect identification using phonotactic and prosodic features. SAI Intelligent Systems Conference – IntelliSys, UK, pp. 963–970.

  • Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Department of Computer Science, University of Regina, Canada, pp 1–22.

  • Murray, I. R. & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion. Journal of Acoustics Society of America, 93(2), 1097–1108.

    Article  Google Scholar 

  • Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9(7), 727–730.

    Article  Google Scholar 

  • Talkin, D. (1995). A robust algorithm for pitch tracking.In W. B. Kleijn & K. K. Paliwal (Eds.), Speech coding and synthesis. Amsterdam: Elsevier.

    Google Scholar 

  • Tamburini, F. (2002). Automatic detection of prosodic prominence in continuous speech, Proceedings of Third International Conference on Language Resources and Evaluation – LREC. Spain, pp. 301–306.

  • van Santen, J. P. H. (1994). Assignment of segmental duration in text-to-speech synthesis. Computer Speech and Language, 8(2), 95–128.

    Article  Google Scholar 

  • Wang, M., & Lin, M. (2004). An analysis of Pitch in Chinese spontaneous speech. International Symposium on Tonal Aspects of Tone Languages, Beijing, China.

Download references

Acknowledgements

The authors would like to thank the reviewers for the detailed and valuable feedback which helped improve this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Etman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Etman, A., Beex, A.A. The effect of pitch tracking on automatic dialect identification. Int J Speech Technol 20, 629–634 (2017). https://doi.org/10.1007/s10772-017-9434-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9434-0

Keywords

Navigation