research-article

A comparative study of the classification techniques in isolated Mandarin syllable tone recognition

Authors:

Cen LiAuthors Info & Claims

ACMSE '11: Proceedings of the 49th annual ACM Southeast Conference

Pages 263 - 269

https://doi.org/10.1145/2016039.2016108

Published: 24 March 2011 Publication History

Abstract

Tonal languages, such as Chinese, use systematic variations of pitch to distinguish lexical or grammatical meaning. Thus, tone recognition is essential for tonal languages. Typically, tone recognition for isolated syllables involves three major steps: fundamental frequency (F₀) detection, feature extraction, and classification. The work compares different techniques for these three steps and to answer the questions: for Mandarin Chinese syllables, what combination of fundamental frequency detection and feature extraction methods best prepare data for classification, and what is the most effective classification method for tone recognition. Three types of F₀ detection methods (autocorrelation, cross-correlation and cepstrum), two feature extraction schemes (sampled F₀ and average F₀, slope and energy from three subsegments), four normalization methods (slope only, 0--100 scaled, z-score and T1 shift), and two classification methods (Support Vector Machine (SVM) and Multilayer Perceptron (MLP)) were experimentally studied using 700 collected data samples.

References

[1]

Attwater, D., Edgington, M., Durston, P., and Whittaker, S., "Practical issues in the application of speech technology to network and customer service applications," Speech Communication, vol. 31, pp. 279--291, Aug 2000.

Digital Library

[2]

Bishop, C. M., Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.

[3]

Boersma, P., "Accurate Short-term Anaylysis of the Fundamental Frequency and the Harmonics-to-noice Ratio of a Sapmled Sound," in IFA Proceedings, 1993, pp. 97--110.

[4]

Boersma, P. and Weenink, D., "Praat," 5.1.31 ed Amsterdam, 2009.

[5]

Brown, M. K., Buntschuh, B. M., and Wilpon, J. G., "Sam - a Perceptive Spoken Language Understanding Robot," IEEE Transactions on Systems Man and Cybernetics, vol. 22, pp. 1390--1402, Nov-Dec 1992.

[6]

Burges, C. J. C. A tutorial on support vector machines for pattern recognition, Knowledge discovery and Data Mining, 2, pp. 1--43, 1998.

Digital Library

[7]

Camachoa, A., "Comment on "Cepstrum pitch determination" {J. Acoust. Soc.Am. 41, 293--309 (1967)} (L)," in J. Acoust. Soc. Am., 2008, pp. 2706--2707.

[8]

Cheveigne, A. D. and Kawahara, H., "Comparative Evaluation of F₀ Estimation Algorithms," in Eurospeech, Aalborg, 2001, pp. 2451--2454.

[9]

Gerhard, D., "Pitch Extraction and Fundamental Frequency: History and Current Techniques," University of Regina, Regina, 2003.

[10]

Happe, A., Pouliquen, B., Burgun, A., Cuggia, M., and Le Beux, P., "Automatic concept extraction from spoken medical reports," International Journal of Medical Informatics, vol. 70, pp. 255--263, Jul 2003.

[11]

Haykin, S., Neural networks: A Comprehensive Foundation. New York: Maxwell Macmillan International, 1994.

Digital Library

[12]

Huckvale, M., "Speech Filing System," 2008.

[13]

Joachims, T. Making large scale SVM learning practical. Advances in Kernel Methods -- Support Vector Learning, ed. Scholkopf, B, Burges, C. and Smola, A. MIT Press, Cambridge, USA, 1998.

Digital Library

[14]

Kotsiantis, S. B., "Supervised Machine Learning: A Review of Classification Techniques," Informatica, vol. 31, pp. 249--268, 2007.

[15]

Maleerat, S., Supot, N., and Choochart, H., "Tone Classification for Isolated Thai Words using Multi-Layer Perceptron" in World Congress on Engineering and Computer Science, San Francisco, 2009, pp. 1322--1325.

[16]

Petrushin, V. A., "Learning Chinese Tones," in 8 ^th European Conference on Speech Communication and Technology Geneva, Switzerland, 2003, pp. 3145--3148.

[17]

Samad, S. A., Hussain, A., and Fah, L. K., "Pitch Detection of Speech Signals using the Cross-CorreIation Technique," in IEEE, 2000.

[18]

Surendran, D., Levow, G. A., and Xu, Y., "Tone Recognition in Mandarin using Focus," in Interspeech 2005, Lisbon, Portugal, 2005, pp. 3301--3304.

[19]

Talkin, D., "A Robust Algorithm for Pitch Tracking (RAPT)," in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, Eds. Amsterdam: Elsevier Science, 1995.

[20]

Wong, P. F. (2002). Integration of Tone Related Feature for Chinese Speech Recognition. Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces.

Digital Library

[21]

Zhou, N., Zhang, W., Lee, C. Y., and Xu, L., "Lexical Tone Recognition with an Artificial Neural Network," Ear and Hearing, vol. 29, p. 9, 2008.

[22]

Zuo, P., "Tonal Coarticulation: Contextual F₀ Realizationof Mandarin Chinese Tones," in Computer Department. vol. Ph.D Stuttgart: Institut fuer Maschinelle Sprachverarbeitung, 2002, p. 34.

Cited By

YAN JTIAN LWANG XLIU JLI M(2023)A Mandarin Tone Recognition Algorithm Based on Random Forest and Features FusionProceedings of the 7th International Conference on Control Engineering and Artificial Intelligence10.1145/3580219.3580249(168-172)Online publication date: 28-Jan-2023
https://dl.acm.org/doi/10.1145/3580219.3580249
Kertkeidkachorn NPunyabukkana PSuchato A(2015)Acoustic Features for Hidden Conditional Random Fields--Based Thai Tone ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/283308815:2(1-26)Online publication date: 11-Dec-2015
https://dl.acm.org/doi/10.1145/2833088

Recommendations

Tone recognition of isolated mandarin syllables
ICISP'10: Proceedings of the 4th international conference on Image and signal processing

Mandarin is tonal language. For Mandarin, tone identification is very important for speech recognition and pronunciation evaluation. Mandarin tone behavior varies greatly from speaker to speaker and it presents the greatest challenge to any speaker-...
Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as ...
Using tone information in Cantonese continuous speech recognition

In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ACMSE '11: Proceedings of the 49th annual ACM Southeast Conference

March 2011

399 pages

ISBN:9781450306867

DOI:10.1145/2016039

Conference Chair:
Victor Clincy
Kennesaw State University
,
General Chairs:
Ken Hoganson
Kennesaw State University
,
Jose Garrido
Kennesaw State University
,
Program Chair:
Venu Dasigi
Southern Polytechnic State University

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ACM SE '11

Sponsor:

ACM

ACM SE '11: ACM Southeast Regional Conference

March 24 - 26, 2011

Georgia, Kennesaw

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
159
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

YAN JTIAN LWANG XLIU JLI M(2023)A Mandarin Tone Recognition Algorithm Based on Random Forest and Features FusionProceedings of the 7th International Conference on Control Engineering and Artificial Intelligence10.1145/3580219.3580249(168-172)Online publication date: 28-Jan-2023
https://dl.acm.org/doi/10.1145/3580219.3580249
Kertkeidkachorn NPunyabukkana PSuchato A(2015)Acoustic Features for Hidden Conditional Random Fields--Based Thai Tone ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/283308815:2(1-26)Online publication date: 11-Dec-2015
https://dl.acm.org/doi/10.1145/2833088

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten