skip to main content
10.1145/2016039.2016108acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

A comparative study of the classification techniques in isolated Mandarin syllable tone recognition

Published: 24 March 2011 Publication History

Abstract

Tonal languages, such as Chinese, use systematic variations of pitch to distinguish lexical or grammatical meaning. Thus, tone recognition is essential for tonal languages. Typically, tone recognition for isolated syllables involves three major steps: fundamental frequency (F0) detection, feature extraction, and classification. The work compares different techniques for these three steps and to answer the questions: for Mandarin Chinese syllables, what combination of fundamental frequency detection and feature extraction methods best prepare data for classification, and what is the most effective classification method for tone recognition. Three types of F0 detection methods (autocorrelation, cross-correlation and cepstrum), two feature extraction schemes (sampled F0 and average F0, slope and energy from three subsegments), four normalization methods (slope only, 0--100 scaled, z-score and T1 shift), and two classification methods (Support Vector Machine (SVM) and Multilayer Perceptron (MLP)) were experimentally studied using 700 collected data samples.

References

[1]
Attwater, D., Edgington, M., Durston, P., and Whittaker, S., "Practical issues in the application of speech technology to network and customer service applications," Speech Communication, vol. 31, pp. 279--291, Aug 2000.
[2]
Bishop, C. M., Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.
[3]
Boersma, P., "Accurate Short-term Anaylysis of the Fundamental Frequency and the Harmonics-to-noice Ratio of a Sapmled Sound," in IFA Proceedings, 1993, pp. 97--110.
[4]
Boersma, P. and Weenink, D., "Praat," 5.1.31 ed Amsterdam, 2009.
[5]
Brown, M. K., Buntschuh, B. M., and Wilpon, J. G., "Sam - a Perceptive Spoken Language Understanding Robot," IEEE Transactions on Systems Man and Cybernetics, vol. 22, pp. 1390--1402, Nov-Dec 1992.
[6]
Burges, C. J. C. A tutorial on support vector machines for pattern recognition, Knowledge discovery and Data Mining, 2, pp. 1--43, 1998.
[7]
Camachoa, A., "Comment on "Cepstrum pitch determination" {J. Acoust. Soc.Am. 41, 293--309 (1967)} (L)," in J. Acoust. Soc. Am., 2008, pp. 2706--2707.
[8]
Cheveigne, A. D. and Kawahara, H., "Comparative Evaluation of F0 Estimation Algorithms," in Eurospeech, Aalborg, 2001, pp. 2451--2454.
[9]
Gerhard, D., "Pitch Extraction and Fundamental Frequency: History and Current Techniques," University of Regina, Regina, 2003.
[10]
Happe, A., Pouliquen, B., Burgun, A., Cuggia, M., and Le Beux, P., "Automatic concept extraction from spoken medical reports," International Journal of Medical Informatics, vol. 70, pp. 255--263, Jul 2003.
[11]
Haykin, S., Neural networks: A Comprehensive Foundation. New York: Maxwell Macmillan International, 1994.
[12]
Huckvale, M., "Speech Filing System," 2008.
[13]
Joachims, T. Making large scale SVM learning practical. Advances in Kernel Methods -- Support Vector Learning, ed. Scholkopf, B, Burges, C. and Smola, A. MIT Press, Cambridge, USA, 1998.
[14]
Kotsiantis, S. B., "Supervised Machine Learning: A Review of Classification Techniques," Informatica, vol. 31, pp. 249--268, 2007.
[15]
Maleerat, S., Supot, N., and Choochart, H., "Tone Classification for Isolated Thai Words using Multi-Layer Perceptron" in World Congress on Engineering and Computer Science, San Francisco, 2009, pp. 1322--1325.
[16]
Petrushin, V. A., "Learning Chinese Tones," in 8 th European Conference on Speech Communication and Technology Geneva, Switzerland, 2003, pp. 3145--3148.
[17]
Samad, S. A., Hussain, A., and Fah, L. K., "Pitch Detection of Speech Signals using the Cross-CorreIation Technique," in IEEE, 2000.
[18]
Surendran, D., Levow, G. A., and Xu, Y., "Tone Recognition in Mandarin using Focus," in Interspeech 2005, Lisbon, Portugal, 2005, pp. 3301--3304.
[19]
Talkin, D., "A Robust Algorithm for Pitch Tracking (RAPT)," in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, Eds. Amsterdam: Elsevier Science, 1995.
[20]
Wong, P. F. (2002). Integration of Tone Related Feature for Chinese Speech Recognition. Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces.
[21]
Zhou, N., Zhang, W., Lee, C. Y., and Xu, L., "Lexical Tone Recognition with an Artificial Neural Network," Ear and Hearing, vol. 29, p. 9, 2008.
[22]
Zuo, P., "Tonal Coarticulation: Contextual F0 Realizationof Mandarin Chinese Tones," in Computer Department. vol. Ph.D Stuttgart: Institut fuer Maschinelle Sprachverarbeitung, 2002, p. 34.

Cited By

View all
  • (2023)A Mandarin Tone Recognition Algorithm Based on Random Forest and Features FusionProceedings of the 7th International Conference on Control Engineering and Artificial Intelligence10.1145/3580219.3580249(168-172)Online publication date: 28-Jan-2023
  • (2015)Acoustic Features for Hidden Conditional Random Fields--Based Thai Tone ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/283308815:2(1-26)Online publication date: 11-Dec-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '11: Proceedings of the 49th annual ACM Southeast Conference
March 2011
399 pages
ISBN:9781450306867
DOI:10.1145/2016039
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MLP
  2. SVM
  3. classification
  4. tone recognition

Qualifiers

  • Research-article

Conference

ACM SE '11
Sponsor:
ACM SE '11: ACM Southeast Regional Conference
March 24 - 26, 2011
Georgia, Kennesaw

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Mandarin Tone Recognition Algorithm Based on Random Forest and Features FusionProceedings of the 7th International Conference on Control Engineering and Artificial Intelligence10.1145/3580219.3580249(168-172)Online publication date: 28-Jan-2023
  • (2015)Acoustic Features for Hidden Conditional Random Fields--Based Thai Tone ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/283308815:2(1-26)Online publication date: 11-Dec-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media