Abstract:
This paper investigates the impact of automatic sentence segmentation on speech summarization using the ICSI meeting corpus. We use a hidden Markov model (HMM) for senten...Show MoreMetadata
Abstract:
This paper investigates the impact of automatic sentence segmentation on speech summarization using the ICSI meeting corpus. We use a hidden Markov model (HMM) for sentence segmentation that integrates the N-gram language model and pause information, and a maximum marginal relevance (MMR) based extractive summarization method. The system-generated summaries are compared to multiple human summaries using the ROUGE scores. The decision thresholds from the segmentation system are varied to examine the impact of different segments on summarization. We find that (1) using system generated utterance segments degrades summarization performance compared to using human annotated sentences; (2) segmentation needs to be optimized for summarization instead of the segmentation task itself, however, the patterns are slightly different from prior work for other tasks such as parsing; and (3) there are effects from different summarization evaluation metrics as well as speech recognition errors.
Date of Conference: 31 March 2008 - 04 April 2008
Date Added to IEEE Xplore: 12 May 2008
ISBN Information: