Loading [a11y]/accessibility-menu.js
Low-latency speaker diarization based on Bayesian information criterion with multiple phoneme classes | IEEE Conference Publication | IEEE Xplore

Low-latency speaker diarization based on Bayesian information criterion with multiple phoneme classes


Abstract:

Low-latency speaker diarization is desirable for online-oriented speaker adaptation in real-time speech recognition. Especially in spontaneous conversations, several spea...Show More

Abstract:

Low-latency speaker diarization is desirable for online-oriented speaker adaptation in real-time speech recognition. Especially in spontaneous conversations, several speakers tend to speak alternatively and continuously without any silence in between utterances. We therefore propose a speaker diarization method that detects speaker-change points and determines the speaker with a fixed low latency on the basis of a Bayesian information criterion (BIC) by using acoustic features classified into multiple phoneme classes. To improve the accuracy of speaker diarization in the low latency condition, the speaker-decision is made continuously at each phoneme boundary. In an experiment on conversational broadcast news programs, our diarization method reduced the speaker diarization error rate relatively by 20.0% compared to the conventional BIC with a single phoneme class. The online speaker adaptation applied in a speech-recognition experiment reduced word error rate at speaker-change points relatively by 7.8%.
Date of Conference: 25-30 March 2012
Date Added to IEEE Xplore: 30 August 2012
ISBN Information:

ISSN Information:

Conference Location: Kyoto, Japan

Contact IEEE to Subscribe

References

References is not available for this document.