Abstract
This paper proposes a new discrete speech recognition method which investigates the capability of graphical models based on tree distributions that are widely used in many optimization areas. A novel spanning tree structure that utilizes the temporal nature of speech signal is proposed. The proposed tree structure significantly reduces complexity in so far that can reflect simply a few essential relationships rather than all possible structures of trees. The application of this model is illustrated with different isolated word databases. Experimentally it has been shown that, the proposed approaches compared to the conventional discrete hidden Markov model (DHMM) yield reduced error rates of 2.54 %–12 % and improve recognition speed minimum 3-fold. In addition, an impressive gain in learning time is observed. The overall recognition accuracy was 93.09 %–95.34 %, thereby confirming the effectiveness of the proposed methods.
Similar content being viewed by others
References
Bilmes, J. A., & Bartels, C. (2005). Graphical model architectures for speech recognition. IEEE Signal Processing Magazine, 22, 89–100.
Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.
El Fkihi, S., Daoudi, M., & Aboutajdine, D. (2008). The mixture of k-optimal-spanning-trees based probability approximation: application to skin detection. Image and Vision Computing, 26, 1574–1590.
Gormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (1990). Introduction to algorithms. Cambridge: MIT Press.
Gray, R. (1984). Vector quantization. IEEE ASSP Magazine , 1, 4–29.
Hammami, N., & Sellam, M. (2009). Tree distribution classifier for automatic spoken Arabic digit recognition. In International conference for internet technology and secured transactions, 2009, IEEE, ICITST 2009 (pp. 1–4).
Hammami, N., Beda, M., & Farah, N. (2011). HMM parameters estimation based on cross-validation for spoken Arabic digits recognition. In IEEE international conference on communications, computing and control applications (CCCA) (pp. 1–4).
Ioffe, S., & Forsyth, D. (2001). Mixtures of trees for object recognition. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, 2001, CVPR 2001 (Vol. 2, pp. II-180–II-185).
Kudo, M., Toyama, J., & Shimbo, M. (1999). Japanese Vowels. UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets/Japanese+Vowels.
Meila, M. (1999). An accelerated Chow and Liu algorithm: fitting tree distributions to high dimensional sparse data.
Miguel, A., Ortega, A., Buera, L., & Lleida, E. (2011). Bayesian networks for discrete observation distributions in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19, 1476–1489.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Songfang, H., & Renals, S. (2010). Hierarchical Bayesian language models for conversational speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1941–1954.
Tan, V. Y. F., Anandkumar, A., & Willsky, A. S. (2010). Learning Gaussian tree models: analysis of error exponents and extremal structures. IEEE Transactions on Signal Processing, 58, 2701–2714.
Tan, V. Y. F., Anandkumar, A., Lang, T., & Willsky, A. S. (2011). A large-deviation analysis of the maximum-likelihood learning of Markov tree structures. IEEE Transactions on Information Theory, 57, 1714–1735.
Torsello, A., & Hancock, E. R. (2006). Learning shape-classes using a mixture of tree-unions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 954–967.
U. o. B.-M. Laboratory of Automatic and Signals (2008). Spoken Arabic digits. UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit.
Wiesel, A., Eldar, Y. C., & Hero, A. O. (2010). Covariance estimation in decomposable Gaussian graphical models. IEEE Transactions on Signal Processing, 58, 1482–1492.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hammami, N., Bedda, M. & Farah, N. Tree distributions approximation model for robust discrete speech recognition. Int J Speech Technol 15, 455–462 (2012). https://doi.org/10.1007/s10772-012-9141-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9141-9