Unsupervised Segmentation of Speech Signals Using Kernel-Gram Matrices

Bhati, Saurabhchand; Nayak, Shekhar; Sri Rama Murty, K.

doi:10.1007/978-981-13-0020-2_13

Saurabhchand Bhati¹²,
Shekhar Nayak¹² &
K. Sri Rama Murty¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 841))

Included in the following conference series:

National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics

1564 Accesses
4 Citations

Abstract

The objective of this paper is to develop an unsupervised method for segmentation of speech signals into phoneme-like units. The proposed algorithm is based on the observation that the feature vectors from the same segment exhibit higher degree of similarity than the feature vectors across the segments. The kernel-Gram matrix of an utterance is formed by computing the similarity between every pair of feature vectors in the Gaussian kernel space. The kernel-Gram matrix consists of square patches, along with the principle diagonal, corresponding to different phoneme-like segments in the speech signal. It detects the number of segments, as well as their boundaries automatically. The proposed approach does not assume any information about input utterances like exact distribution of segment length or correct number of segments in an utterance. The proposed method out-performs the state-of-the-art blind segmentation algorithms on Zero Resource 2015 databases and TIMIT database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)
Article Google Scholar
Furui, S.: Digital Speech Processing: Synthesis, and Recognition. CRC Press, Boca Raton (2000)
Google Scholar
Wang, A., et al.: An industrial strength audio search algorithm. In: ISMIR, vol. 2003, pp. 7–13, Washington, D.C. (2003)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Gales, M.J., Young, S.J.: Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)
Article Google Scholar
Brugnara, F., Falavigna, D., Omologo, M.: Automatic segmentation and labeling of speech based on hidden Markov models. Speech Commun. 12(4), 357–370 (1993)
Article Google Scholar
Demuynck, K., Laureys, T.: A comparison of different approaches to automatic speech segmentation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 277–284. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46154-X_38
Chapter Google Scholar
Scharenborg, O., Ernestus, M., Wan, V.: Segmentation of speech: child’s play? (2007)
Google Scholar
Rybach, D., Gollan, C., Schluter, R., Ney, H.: Audio segmentation for speech recognition using segment features. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4197–4200. IEEE (2009)
Google Scholar
Davy, M., Godsill, S.: Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 1313–1316. IEEE (2002)
Google Scholar
Dusan, S., Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)
Google Scholar
Aversano, G., Esposito, A., Marinaro, M.: A new text-independent method for phoneme segmentation. In: Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems, MWSCAS 2001, vol. 2, pp. 516–519. IEEE (2001)
Google Scholar
Goodwin, M.M., Laroche, J.: Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 131–134. IEEE (2003)
Google Scholar
Estevan, Y.P., Wan, V., Scharenborg, O.: Finding maximum margin segments in speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. 937–940. IEEE (2007)
Google Scholar
Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2008)
Article Google Scholar
Micallef, P., Chilton, T.: Automatic identification of phoneme boundaries using a mixed parameter model. In: Fifth European Conference on Speech Communication and Technology (1997)
Google Scholar
van Santen, J.P., Sproat, R.: High-accuracy automatic segmentation. In: EUROSPEECH (1999)
Google Scholar
Chang, J.W., Glass, J.R.: Segmentation and modeling in segment-based recognition. In: Fifth European Conference on Speech Communication and Technology (1997)
Google Scholar
Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 3989–3992. IEEE (2008)
Google Scholar
Leow, S.J., Chng, E.S., Lee, C.-H.: Language-resource independent speech segmentation using cues from a spectrogram image. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5813–5817. IEEE (2015)
Google Scholar
Stan, A., Valentini-Botinhao, C., Orza, B., Giurgiu, M.: Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 597–602. IEEE (2016)
Google Scholar
Khanagha, V., Daoudi, K., Pont, O., Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)
Article Google Scholar
Rasanen, O., Laine, U., Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Speech Technologies. InTech (2011)
Google Scholar
Lee, C., Glass, J.: A nonparametric Bayesian approach to acoustic model discovery. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 40–49. Association for Computational Linguistics (2012)
Google Scholar
Vert, J.-P., Tsuda, K., Schölkopf, B.: A primer on kernel methods. In: Kernel Methods in Computational Biology, pp. 35–70 (2004)
Google Scholar
Rabiner, L.R.: Multirate Digital Signal Processing. Prentice Hall PTR, Upper Saddle River (1996)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, no. 34, pp. 226–231 (1996)
Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical report N, vol. 93 (1993)
Google Scholar
Versteegh, M., Thiolliere, R., Schatz, T., Cao, X.-N., Anguera, X., Jansen, A., Dupoux, E.: The zero resource speech challenge 2015. In: Interspeech, pp. 3169–3173 (2015)
Google Scholar
Jansen, A., Van Durme, B.: Efficient spoken term discovery using randomized algorithms. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 401–406. IEEE (2011)
Google Scholar
Räsänen, O., Doyle, G., Frank, M.C.: Unsupervised word discovery from speech using automatic segmentation into syllable-like units. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Lyzinski, V., Sell, G., Jansen, A.: An evaluation of graph clustering methods for unsupervised term discovery. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Vuuren, V., Bosch, L., Niesler, T.: Unconstrained speech segmentation using deep neural networks. In: Proceedings of the International Conference on Pattern Recognition Applications and Methods, ICPRAM 2015, vol. 1, pp. 248–254 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, IIT Hyderabad, Hyderabad, India
Saurabhchand Bhati, Shekhar Nayak & K. Sri Rama Murty

Authors

Saurabhchand Bhati
View author publications
You can also search for this author in PubMed Google Scholar
Shekhar Nayak
View author publications
You can also search for this author in PubMed Google Scholar
K. Sri Rama Murty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saurabhchand Bhati .

Editor information

Editors and Affiliations

Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India
Renu Rameshan
Indraprastha Institute of Information Technology, New Delhi, India
Chetan Arora
Indian Institute of Technology, New Delhi, India
Sumantra Dutta Roy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhati, S., Nayak, S., Sri Rama Murty, K. (2018). Unsupervised Segmentation of Speech Signals Using Kernel-Gram Matrices. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_13

Download citation

DOI: https://doi.org/10.1007/978-981-13-0020-2_13
Published: 26 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0019-6
Online ISBN: 978-981-13-0020-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics