Phoneme sequence recognition via DTW-based classification

Hamooni, Hossein; Mueen, Abdullah; Neel, Amy

doi:10.1007/s10115-015-0885-9

Phoneme sequence recognition via DTW-based classification

Regular Paper
Published: 19 October 2015

Volume 48, pages 253–275, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hossein Hamooni¹,
Abdullah Mueen¹ &
Amy Neel²

623 Accesses
6 Citations
Explore all metrics

Abstract

Phonemes are the smallest units of sound produced by a human being. Automatic classification of phonemes is a well-researched topic in linguistics due to its potential for robust speech recognition. With the recent advancement of phonetic segmentation algorithms, it is now possible to generate datasets of millions of phonemes automatically. Phoneme classification on such datasets is a challenging data mining task because of the large number of classes (over a hundred) and complexities of the existing methods. In this paper, we introduce the phoneme classification problem as a data mining task. We propose a dual-domain (time and frequency) hierarchical classification algorithm. Our method uses a dynamic time warping (DTW)-based classifier in the top layers and time–frequency features in the lower layer. We cross-validate our method on phonemes from three online dictionaries and achieved up to 35 % improvement in classification compared with existing techniques. We further modify our vowel classifier by adopting DTW distance over time–frequency coefficients and gain an additional 3 % improvement. We provide case studies on classifying accented phonemes and speaker-invariant phoneme classification. Finally, we show a demonstration of how phoneme classification can be used to recognize speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

Automatic speech recognition: a survey

Article 10 November 2020

Notes

Color figures are available in online version of the paper.

References

Yuan J, Liberman M (2008) Speaker identification on the scotus corpus. In: Proceedings of acoustics 2008
Hamooni H, Mueen A (2014) Dual-domain hierarchical classification of phonetic time series. In: ICDM
Garofolo J (1993) Timit acoustic-phonetic continuous speech corpusldc93s1, web download. Philadelphia: linguistic data consortium
International phonetic alphabet. https://en.wikipedia.org/wiki/International_Phonetic_Alphabet
Lee K-F, Hon H-W (1989) Speaker-independent phone recognition using hidden Markov models, acoustics, speech and signal processing. IEEE Transa on 37(11):1641–1648
Google Scholar
Dekel O, Keshet J, Singer Y (2005) An online algorithm for hierarchical phoneme classification. In: Proceedings of the first international conference on machine learning for multimodal interaction, ser. MLMI’04, 2005, pp 146–158
Carla L, Fernando P (2011) Phoneme recognition on the timit database. Speech Technologies, [Online]. http://www.intechopen.com/books/export/citation/BibTex/speech-technologies/phoneme-recognition-on-the-timit-database
Schwarz P, Matejka P, Cernocky J (2006) Hierarchical structures of neural networks for phoneme recognition. In: 2006 IEEE international conference on acoustics, speech and signal processing, 2006. ICASSP 2006 proceedings
Rahman-Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22
Article Google Scholar
Salomon J (2001) Support vector machines for phoneme classification, Master of Science, School of Artificial Intelligence, Division of Informatics, University of Edinburgh
Mohamed A, Hinton G (2010) Phone recognition using restricted Boltzmann machines. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), pp 4354–4357
Dewey E (1970) Godfrey, relative frequency of english spellings. In: International Reading Association, Anaheim, California, May 6–9, 1970. http://files.eric.ed.gov/fulltext/ED042572.pdf
Matlab implementation to compute mel frequency cepstrum coefficients. http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab/content/mfcc/mfcc.m
Mueen A, Nath S, Liu J (2010) Fast approximate correlation for massive time-series data. In: SIGMOD conference, pp 171–182
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. SIGMOD Rec 23:419–429
Article Google Scholar
The cmu pronouncing dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Ding H, Trajcevski G, Wang X, Keogh E (2008) Querying and mining of time series data: Experimental comparison of representations and distance measures. In: Proceedings of the 34 th VLDB, pp 1542–1552
Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, ser. VLDB ’02, pp 406–417
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping, ser. KDD ’12, pp 262–270
Sart D, Mueen A, Najjar W, Niennattrakul V, Keogh EJ (2010) Accelerating dynamic time warping subsequence search with gpus and fpgas. In: ICDM, pp 1001–1006
Petitjean F, Ketterlin A, Ganarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693
Article MATH Google Scholar
Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory DTW for efficient similarity search in time series databases. Proc VLDB Endow 2(1):826–837
Article Google Scholar
Mueen A (2013) Enumeration of time series motifs of all lengths. In: ICDM, pp 547–556
Cesa-Bianchi N, Gentile C, Zaniboni L (2006) Hierarchical classification: combining Bayes with svm. In: Proceedings of the 23rd international conference on machine learning, ser. ICML ’06, pp 177–184
Repository for supporting materials. http://cs.unm.edu/~hamooni/papers/Dual_2014/index.html
Phoneme recognizer based on long temporal context, http://speech.fit.vutbr.cz/software/phoneme-recognizer-based-long-temporal-context
Yi B-K, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the fourteenth international conference on data engineering, Orlando, Florida, USA, 23-27 Feb 1998, pp 201–208
Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: 2013 IEEE 29th international conference on data engineering (ICDE), vol. 0, pp 1046–1055
Edit distance tutorial. https://web.stanford.edu/class/cs124/lec/med.pdf
Word frequency data. corpus of contemporary american english. http://www.wordfrequency.info
Gmu speech accent archive. http://www.accent.gmu.edu

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of New Mexico, Albuquerque, NM, USA
Hossein Hamooni & Abdullah Mueen
Department of Speech and Hearing Sciences, University of New Mexico, Albuquerque, NM, USA
Amy Neel

Authors

Hossein Hamooni
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Mueen
View author publications
You can also search for this author in PubMed Google Scholar
Amy Neel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hossein Hamooni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hamooni, H., Mueen, A. & Neel, A. Phoneme sequence recognition via DTW-based classification. Knowl Inf Syst 48, 253–275 (2016). https://doi.org/10.1007/s10115-015-0885-9

Download citation

Received: 25 November 2014
Revised: 20 August 2015
Accepted: 05 October 2015
Published: 19 October 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10115-015-0885-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phoneme sequence recognition via DTW-based classification

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Automatic speech recognition: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Phoneme sequence recognition via DTW-based classification

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Automatic speech recognition: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation