Skip to main content
Log in

Tree distributions approximation model for robust discrete speech recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper proposes a new discrete speech recognition method which investigates the capability of graphical models based on tree distributions that are widely used in many optimization areas. A novel spanning tree structure that utilizes the temporal nature of speech signal is proposed. The proposed tree structure significantly reduces complexity in so far that can reflect simply a few essential relationships rather than all possible structures of trees. The application of this model is illustrated with different isolated word databases. Experimentally it has been shown that, the proposed approaches compared to the conventional discrete hidden Markov model (DHMM) yield reduced error rates of 2.54 %–12 % and improve recognition speed minimum 3-fold. In addition, an impressive gain in learning time is observed. The overall recognition accuracy was 93.09 %–95.34 %, thereby confirming the effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bilmes, J. A., & Bartels, C. (2005). Graphical model architectures for speech recognition. IEEE Signal Processing Magazine, 22, 89–100.

    Article  Google Scholar 

  • Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.

    Article  MATH  Google Scholar 

  • El Fkihi, S., Daoudi, M., & Aboutajdine, D. (2008). The mixture of k-optimal-spanning-trees based probability approximation: application to skin detection. Image and Vision Computing, 26, 1574–1590.

    Article  Google Scholar 

  • Gormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (1990). Introduction to algorithms. Cambridge: MIT Press.

    Google Scholar 

  • Gray, R. (1984). Vector quantization. IEEE ASSP Magazine , 1, 4–29.

    Article  Google Scholar 

  • Hammami, N., & Sellam, M. (2009). Tree distribution classifier for automatic spoken Arabic digit recognition. In International conference for internet technology and secured transactions, 2009, IEEE, ICITST 2009 (pp. 1–4).

    Google Scholar 

  • Hammami, N., Beda, M., & Farah, N. (2011). HMM parameters estimation based on cross-validation for spoken Arabic digits recognition. In IEEE international conference on communications, computing and control applications (CCCA) (pp. 1–4).

    Chapter  Google Scholar 

  • Ioffe, S., & Forsyth, D. (2001). Mixtures of trees for object recognition. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, 2001, CVPR 2001 (Vol. 2, pp. II-180–II-185).

    Chapter  Google Scholar 

  • Kudo, M., Toyama, J., & Shimbo, M. (1999). Japanese Vowels. UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets/Japanese+Vowels.

  • Meila, M. (1999). An accelerated Chow and Liu algorithm: fitting tree distributions to high dimensional sparse data.

  • Miguel, A., Ortega, A., Buera, L., & Lleida, E. (2011). Bayesian networks for discrete observation distributions in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19, 1476–1489.

    Article  Google Scholar 

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.

    Article  Google Scholar 

  • Songfang, H., & Renals, S. (2010). Hierarchical Bayesian language models for conversational speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18, 1941–1954.

    Article  Google Scholar 

  • Tan, V. Y. F., Anandkumar, A., & Willsky, A. S. (2010). Learning Gaussian tree models: analysis of error exponents and extremal structures. IEEE Transactions on Signal Processing, 58, 2701–2714.

    Article  MathSciNet  Google Scholar 

  • Tan, V. Y. F., Anandkumar, A., Lang, T., & Willsky, A. S. (2011). A large-deviation analysis of the maximum-likelihood learning of Markov tree structures. IEEE Transactions on Information Theory, 57, 1714–1735.

    Article  Google Scholar 

  • Torsello, A., & Hancock, E. R. (2006). Learning shape-classes using a mixture of tree-unions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 954–967.

    Article  Google Scholar 

  • U. o. B.-M. Laboratory of Automatic and Signals (2008). Spoken Arabic digits. UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit.

  • Wiesel, A., Eldar, Y. C., & Hero, A. O. (2010). Covariance estimation in decomposable Gaussian graphical models. IEEE Transactions on Signal Processing, 58, 1482–1492.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nacereddine Hammami.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hammami, N., Bedda, M. & Farah, N. Tree distributions approximation model for robust discrete speech recognition. Int J Speech Technol 15, 455–462 (2012). https://doi.org/10.1007/s10772-012-9141-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9141-9

Keywords

Navigation