Skip to main content

On Computational Working Memory for Speech Analysis

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

  • 944 Accesses

Abstract

This paper proposes a scheme for analysing speech data inspired by the concept of working memory, it is uses wavelet analysis and unsupervised learning models. The scheme relies on splitting a sound stream in arbitrary chunks and producing feature streams by sequentially analysing each chunk with time-frequency methods. The purpose of this is to precisely detect the time of transitions as well as length of stable acoustic units that occur between them. The procedure uses two feature extraction stages to analyse the audio chunk and two types of unsupervised machine learning models, hierarchical clustering and Self-Organising Maps. The first pass looks at the whole chunk piece by piece looking for speech and silence parts, this stage takes the root mean square, the arithmetic mean, standard deviation from the samples of each piece and classifies the features using hierarchical clustering into speech and non-speech clusters. The second pass looks for stable patterns and transitions at the locations inferred from the results of the first pass, this step uses Harmonic and Daubechies wavelets for coefficient extraction. After the analysis procedures have been completed the chunk advances 2 seconds, the transient and stable feature vectors are saved within SOMs and a new cycle begins on a new chunk.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ostendorf, M.: Moving Beyond the ”Beads-on-a-String” Model of Speech. In: Proc. IEEE ASRU Workshop, pp. 79–84. IEEE Press (1999)

    Google Scholar 

  2. Hsu, A.S., Chater, N., Vitányi, P.M.B.: The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis. Cognition 120, 380–390 (2011)

    Article  Google Scholar 

  3. Stouten, V., Demuynck, K., Van Hamme, H.: Discovering Phone Patterns in Spoken Utterances by Non-Negative Matrix Factorization. Signal Processing Letters 15, 131–134 (2008)

    Article  Google Scholar 

  4. Räsänen, O.: A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events. Cognition 120, 149–176 (2011)

    Article  Google Scholar 

  5. Mercado III, E., Myers, C.E., Gluck, M.A.: Modeling auditory cortical processing as an adaptive chirplet transform. Neurocomputing 32-33, 913–919 (2000)

    Google Scholar 

  6. Pinchevar, R., Najaf-Zadeh, H., Thibault, L., Lahdili, H.: Auditory-inspired sparse representation of audio signals. Speech Communication 53, 643–657 (2011)

    Article  Google Scholar 

  7. Gómez-Vilda, P., Ferrández-Vicente, J.M., Rodellar-Biarge, V., Álvarez-Marquina, A., Mazaira-Fernández, L.M., Olalla, R.M., Muñoz-Mulas, C.: Neuromorphic detection of speech dynamics. Neurocomputing 74, 1191–1202 (2011)

    Article  Google Scholar 

  8. Baddeley, A.: Working memory: Looking back and looking forward. Nature Reviews: Neuroscience 4, 829–839 (2003)

    Article  Google Scholar 

  9. Kohonen, T.: The ”neural” phonetic typewriter. Computer 21(3), 11–22 (1988)

    Article  Google Scholar 

  10. Kohonen, T.: The Self-Organising Map. Proceedings of the IEEE 78(7), 1464–1480 (1990)

    Article  Google Scholar 

  11. Daubechies, I.: Ten lectures on wavelets. Society for Industrial and Applied Mathematics (SIAM) (1992) ISBN:0-89871-274-2

    Google Scholar 

  12. Walker, J.S.: A primer on wavelets and their scientific applications. Chapman & Hall (2008) ISBN 978-1-58488-745-4

    Google Scholar 

  13. Newland, D.E.: Harmonic wavelet analysis. Proc. R. Soc.Lond. 443, 203–225 (1993)

    Article  MATH  Google Scholar 

  14. Chouetier, G.F., Glass, J.R.: An implementation of rational wavelets and filter design for phonetic classification. IEEE Transactions on Audio, Speech, and Language Processing 15, 939–948 (2007)

    Article  Google Scholar 

  15. Shao, Y., Chang, C.H.: A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter bank modeling of human auditory system. In: Proc. IEEE Int. Symp. Circuits Syst., pp. 121–124 (2006)

    Google Scholar 

  16. Shao, Y., Chang, C.H.: Bayesian separation with sparsity promotion in perceptual wavelet domain for speech enhancement and hybrid speech recognition. IEEE Transactions on Systems, Man, and Cybernetics 41, 284–293 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Topoleanu, T.S. (2011). On Computational Working Memory for Speech Analysis. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25020-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25019-4

  • Online ISBN: 978-3-642-25020-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics