Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Teimoori, Farshad; Razzazi, Farbod

doi:10.1007/s00034-018-0974-6

Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Published: 10 November 2018

Volume 38, pages 2489–2522, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

150 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents a new segmentation method for diarization application. This method is established on a support vector regression-based discriminative engine which bears the main duty of estimating the most possible change points. This engine is aided by a generative classifier in a help-training approach. Considering that there are no pre-labeled training samples in a segmentation task, the proposed model-based segmentation method attempts to suggest an appropriate solution to overcome this obstacle. The introduced iterative method supposes that the initial frames in a given segment belong to the associated speaker. This hypothesis permits the SVR engine to be initiated in the first iteration. In the following iterations, discriminative regression block in conjunction with the generative classifier tags the remaining frames with advantageous (positive) and disadvantageous (negative) labels. These newly labeled frames establish the working set to update the associated speaker model. In addition to the proposed segmentation method, a new strategy is introduced to estimate inserted and deleted change points. In the evaluation section, in addition to the common experimental assessment, attempts are made to achieve a unique and comprehensive insight into the statistical aspects of choosing training samples. Finally, comparison of the proposed segmentation and diarization system with similar method shows approximately 22.95% enhancement in the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised help-trained LS-SVR-based segmentation in speaker diarization system

Article 04 October 2018

Farshad Teimoori & Farbod Razzazi

Latent class model with application to speaker diarization

Article Open access 09 July 2019

Liang He, Xianhong Chen, … Michael T. Johnson

Speaker Change Detection Using Binary Key Modelling with Contextual Information

References

M.M. Adankon, M. Cheriet, Help-training for semi-supervised support vector machines. Pattern Recogn. 44(9), 2220–2230 (2011)
Article Google Scholar
X. Angueramiro et al., Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Article Google Scholar
X. Anguera et al., in International Workshop on Machine Learning for Multimodal Interaction. Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system, (Springer, New York, 2005), p. 402–414
M. Barnard et al., Robust multi-speaker tracking via dictionary learning and identity modeling. IEEE Trans. Multimed. 16(3), 864–880 (2014)
Article Google Scholar
B. Bielefeld, Language identification using shifted delta cepstrum. Presented at the 14th annual speech research symposium (1994)
M. Cettolo, M. Vescovi, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP’03). Efficient audio segmentation algorithms based on the BIC, (IEEE, 2003), p. VI–537
S. Cumani, P. Laface, Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(11), 1590–1600 (2014)
Article Google Scholar
H. Frihia, H. Bahi, HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20(3), 563–573 (2017)
Article Google Scholar
S. Galliano et al., in The ESTER 2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts. Tenth annual conference of the international speech communication association (2009)
V. Hautamaki et al., Sparse classifier fusion for speaker verification. IEEE Trans. Audio Speech Lang. Process. 21(8), 1622–1631 (2013)
Article Google Scholar
M. Hu et al., Speaker Change Detection and Speaker Diarization Using Spatial Information. 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), (IEEE, 2015), p. 5743–5747
M. India et al., LSTM neural network-based speaker segmentation using acoustic and language modelling. Presented at the August 20 (2017)
T. Kinnunen, P. Rajan, in ICASSP. A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data, (Citeseer, 2013), p. 7229–7233
M. Kotti et al., in 2006 IEEE International Conference on Multimedia and Expo. Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches, (IEEE, 2006), p. 1101–1104
K. Kumar et al., in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Delta-spectral cepstral coefficients for robust speech recognition, (IEEE, 2011), p. 4784–4787
H. Kun, D.L. Wang, Towards generalizing classification based speech separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 168–177 (2013)
Article Google Scholar
J. Li et al., An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 745–777 (2014)
Article Google Scholar
J. Mairal et al., in Proceedings of the 26th Annual International Conference on Machine Learning. Online dictionary learning for sparse coding, (ACM, 2009), p. 689–696
A.S. Malegaonkar et al., Efficient speaker change detection using adapted Gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 15(6), 1859–1869 (2007)
Article Google Scholar
S. Meignier, T. Merlin, in CMU SPUD Workshop. LIUM SpkDiarization: an open source toolkit for diarization (2010)
H. Meinedo, J. Neto, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP’03). Audio segmentation, classification and clustering in a broadcast news task, (IEEE, 2003), p. II–5
M.H. Moattar, M.M. Homayounpour, A review on speaker diarization systems and approaches. Speech Commun. 54(10), 1065–1103 (2012)
Article Google Scholar
S.H.K. Parthasarathi et al., Wordless sounds: robust speaker diarization using privacy-preserving audio representations. IEEE Trans. Audio Speech Lang. Process. 21(1), 85–98 (2013)
Article MathSciNet Google Scholar
H. Phan et al., Random regression forests for acoustic event detection and classification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 20–31 (2015)
Article Google Scholar
D.A. Reynolds, P. Torres-Carrasquillo, The MIT Lincoln Laboratory RT-04F diarization systems: Applications to broadcast audio and telephone conversations. DTIC Document (2004)
T.N. Sainath et al., Exemplar-based sparse representation features: from TIMIT to LVCSR. IEEE Trans. Audio Speech Lang. Process. 19(8), 2598–2613 (2011)
Article Google Scholar
B. Schölkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, MA, 2002)
Google Scholar
Y. Shao et al., 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07. Incorporating auditory feature uncertainties in robust speaker identification, (IEEE, 2007), p. IV–277
J. Silovsky, J. Prazak, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Speaker diarization of broadcast streams using two-stage clustering based on i-vectors and cosine distance scoring, (IEEE, 2012), p. 4193–4196
M. Sinclair, S. King, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Where are the challenges in speaker diarization? (IEEE, 2013), p. 7741–7745
A. Tritschler, R.A. Gopinath, in Eurospeech. Improved speaker segmentation and segments clustering using the bayesian information criterion (1999), p. 679–682
S. Xavier-de-Souza et al., Coupled simulated annealing. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 40(2), 320–335 (2010)
Article Google Scholar
X. Yong et al., A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Article Google Scholar
X. Zhao et al., CASA-based robust speaker identification. IEEE Trans. Audio Speech Lang. Process. 20(5), 1608–1616 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Farshad Teimoori & Farbod Razzazi

Authors

Farshad Teimoori
View author publications
You can also search for this author in PubMed Google Scholar
Farbod Razzazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farbod Razzazi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teimoori, F., Razzazi, F. Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach. Circuits Syst Signal Process 38, 2489–2522 (2019). https://doi.org/10.1007/s00034-018-0974-6

Download citation

Received: 12 January 2018
Revised: 24 October 2018
Accepted: 25 October 2018
Published: 10 November 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s00034-018-0974-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Abstract

Access this article

Similar content being viewed by others

Unsupervised help-trained LS-SVR-based segmentation in speaker diarization system

Latent class model with application to speaker diarization

Speaker Change Detection Using Binary Key Modelling with Contextual Information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incomplete-Data-Driven Speaker Segmentation for Diarization Application; A Help-Training Approach

Abstract

Access this article

Similar content being viewed by others

Unsupervised help-trained LS-SVR-based segmentation in speaker diarization system

Latent class model with application to speaker diarization

Speaker Change Detection Using Binary Key Modelling with Contextual Information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation