Skip to main content

Unsupervised Segmentation of Speech Signals Using Kernel-Gram Matrices

  • Conference paper
  • First Online:
Book cover Computer Vision, Pattern Recognition, Image Processing, and Graphics (NCVPRIPG 2017)

Abstract

The objective of this paper is to develop an unsupervised method for segmentation of speech signals into phoneme-like units. The proposed algorithm is based on the observation that the feature vectors from the same segment exhibit higher degree of similarity than the feature vectors across the segments. The kernel-Gram matrix of an utterance is formed by computing the similarity between every pair of feature vectors in the Gaussian kernel space. The kernel-Gram matrix consists of square patches, along with the principle diagonal, corresponding to different phoneme-like segments in the speech signal. It detects the number of segments, as well as their boundaries automatically. The proposed approach does not assume any information about input utterances like exact distribution of segment length or correct number of segments in an utterance. The proposed method out-performs the state-of-the-art blind segmentation algorithms on Zero Resource 2015 databases and TIMIT database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  2. Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)

    Article  Google Scholar 

  3. Furui, S.: Digital Speech Processing: Synthesis, and Recognition. CRC Press, Boca Raton (2000)

    Google Scholar 

  4. Wang, A., et al.: An industrial strength audio search algorithm. In: ISMIR, vol. 2003, pp. 7–13, Washington, D.C. (2003)

    Google Scholar 

  5. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  6. Gales, M.J., Young, S.J.: Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)

    Article  Google Scholar 

  7. Brugnara, F., Falavigna, D., Omologo, M.: Automatic segmentation and labeling of speech based on hidden Markov models. Speech Commun. 12(4), 357–370 (1993)

    Article  Google Scholar 

  8. Demuynck, K., Laureys, T.: A comparison of different approaches to automatic speech segmentation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 277–284. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46154-X_38

    Chapter  Google Scholar 

  9. Scharenborg, O., Ernestus, M., Wan, V.: Segmentation of speech: child’s play? (2007)

    Google Scholar 

  10. Rybach, D., Gollan, C., Schluter, R., Ney, H.: Audio segmentation for speech recognition using segment features. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4197–4200. IEEE (2009)

    Google Scholar 

  11. Davy, M., Godsill, S.: Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 1313–1316. IEEE (2002)

    Google Scholar 

  12. Dusan, S., Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)

    Google Scholar 

  13. Aversano, G., Esposito, A., Marinaro, M.: A new text-independent method for phoneme segmentation. In: Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems, MWSCAS 2001, vol. 2, pp. 516–519. IEEE (2001)

    Google Scholar 

  14. Goodwin, M.M., Laroche, J.: Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 131–134. IEEE (2003)

    Google Scholar 

  15. Estevan, Y.P., Wan, V., Scharenborg, O.: Finding maximum margin segments in speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. 937–940. IEEE (2007)

    Google Scholar 

  16. Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2008)

    Article  Google Scholar 

  17. Micallef, P., Chilton, T.: Automatic identification of phoneme boundaries using a mixed parameter model. In: Fifth European Conference on Speech Communication and Technology (1997)

    Google Scholar 

  18. van Santen, J.P., Sproat, R.: High-accuracy automatic segmentation. In: EUROSPEECH (1999)

    Google Scholar 

  19. Chang, J.W., Glass, J.R.: Segmentation and modeling in segment-based recognition. In: Fifth European Conference on Speech Communication and Technology (1997)

    Google Scholar 

  20. Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 3989–3992. IEEE (2008)

    Google Scholar 

  21. Leow, S.J., Chng, E.S., Lee, C.-H.: Language-resource independent speech segmentation using cues from a spectrogram image. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5813–5817. IEEE (2015)

    Google Scholar 

  22. Stan, A., Valentini-Botinhao, C., Orza, B., Giurgiu, M.: Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 597–602. IEEE (2016)

    Google Scholar 

  23. Khanagha, V., Daoudi, K., Pont, O., Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)

    Article  Google Scholar 

  24. Rasanen, O., Laine, U., Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Speech Technologies. InTech (2011)

    Google Scholar 

  25. Lee, C., Glass, J.: A nonparametric Bayesian approach to acoustic model discovery. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 40–49. Association for Computational Linguistics (2012)

    Google Scholar 

  26. Vert, J.-P., Tsuda, K., Schölkopf, B.: A primer on kernel methods. In: Kernel Methods in Computational Biology, pp. 35–70 (2004)

    Google Scholar 

  27. Rabiner, L.R.: Multirate Digital Signal Processing. Prentice Hall PTR, Upper Saddle River (1996)

    Google Scholar 

  28. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, no. 34, pp. 226–231 (1996)

    Google Scholar 

  29. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical report N, vol. 93 (1993)

    Google Scholar 

  30. Versteegh, M., Thiolliere, R., Schatz, T., Cao, X.-N., Anguera, X., Jansen, A., Dupoux, E.: The zero resource speech challenge 2015. In: Interspeech, pp. 3169–3173 (2015)

    Google Scholar 

  31. Jansen, A., Van Durme, B.: Efficient spoken term discovery using randomized algorithms. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 401–406. IEEE (2011)

    Google Scholar 

  32. Räsänen, O., Doyle, G., Frank, M.C.: Unsupervised word discovery from speech using automatic segmentation into syllable-like units. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  33. Lyzinski, V., Sell, G., Jansen, A.: An evaluation of graph clustering methods for unsupervised term discovery. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  34. Vuuren, V., Bosch, L., Niesler, T.: Unconstrained speech segmentation using deep neural networks. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods, ICPRAM 2015, vol. 1, pp. 248–254 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabhchand Bhati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhati, S., Nayak, S., Sri Rama Murty, K. (2018). Unsupervised Segmentation of Speech Signals Using Kernel-Gram Matrices. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0020-2_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0019-6

  • Online ISBN: 978-981-13-0020-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics