ABSTRACT
Although speech is a potentially rich information source, a major barrier to exploiting speech archives is the lack of useful tools for efficiently accessing lengthy speech recordings. This paper develops and evaluates techniques for temporal compression - reducing the time people take to listen to a recording while still extracting critical information. We first describe an exploratory study that identifies novel excision techniques that remove unimportant words or utterances from the recording. We then develop a new method for evaluating how well temporal compression supports users in forming a general understanding of a recording. Applying this method, we demonstrate that excision techniques are generally more effective than standard compression techniques that simply speed up the entire recording.
- AMI Project. http://www.amiproject.org/Google Scholar
- Arons, B. SpeechSkimmer: A system for interactively skimming recorded speech. ACM Trans. Computer-Human Interaction 4, 1 (1997), 3--38. Google ScholarDigital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- Beasley, D.S. and Maki, J.E. Time and frequency altered speech. In Contemporary Issues in Experimental Phonetics, Academic Press, (1976), 419--458.Google Scholar
- Chalfonte, B.L., Fish, R.S. and Kraut, R. Expressive richness: A comparison of speech and text as Media for Revision. Proc. CHI 1991, (1991), 21--26. Google ScholarDigital Library
- Covell, M., Withgott, M. and Slaney, M. Mach1: Nonuniform time-scale modification of speech. Proc. IEEE ICASSP 1998, (1998), 493--496.Google ScholarCross Ref
- Cutler, R., Rui, Y., Gupta, A. Cadiz, J.J. Tashev, I., He, L., Colburn, A., Zhang, Z., Liu, Z. and Silverberg, S. Distributed meetings: A meeting capture and broadcasting system. Proc. 10th ACM International Conf on Multimedia, (2002), 503--512. Google ScholarDigital Library
- Garofolo, J., Auzanne, C.G.P. and Voorhees, E.M. The TREC-9 spoken document retrieval track: A success story. Proc. RIAO-2000, (2000).Google Scholar
- Hays, W.L. Statistics for the Social Sciences. Holt, Rinehart and Winston, 1973.Google Scholar
- He, L. and Gupta, A. User benefits of non-linear time compression. Microsoft Research Technical Report MSR-TR-2000-96, Microsoft, (2000).Google Scholar
- Hejna, D. Real-time time-scale modification of speech via the synchronized overlap-add algorithm. MSc Dissertation, M.I.T., (1990).Google Scholar
- Hori, C. and Furui, S. A new approach to automatic speech summarization. IEEE Trans. Multimedia 5, 3 (2003), 368--378. Google ScholarDigital Library
- Lin, C-W. ROUGE: A package for automatic evaluation of summaries. Proceedings of ACL 2004, (2004), 56--60.Google Scholar
- McKeown, K., Hirschberg, J., Galley, M. and Maskey, S. From text to speech summarization. In Proc. of ICASSP 2005, (2005).Google ScholarCross Ref
- MLMI 2005. http://groups.inf.ed.ac.uk/mlmi05/techprog.html.Google Scholar
- Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E. and Stolcke, A. The meeting project at ICSI. Proc. HLT Conference, (2001), 246--252. Google ScholarDigital Library
- Nenkova, A. and Passonneau, R. Evaluating content selection in summarization: the pyramid model. In Proc HLT-NAACL 2004, (2004), 145--152.Google Scholar
- Sticht, T.G. Comprehension of repeated time-compression recordings. Journal of Experimental Education 37, 4 (1969).Google ScholarCross Ref
- Stifelman, L. Augmenting real-world objects: A paper-based audio notebook. In Proc. CHI 1996, (1996), 199--200. Google ScholarDigital Library
- Tucker, S. and Whittaker, S. Accessing multimodal meeting data: systems, problems and possibilities. In Lecture Notes in Computer Science 3361, (2005), 1--11. Google ScholarDigital Library
- Tucker, S. and Whittaker, S. Novel techniques for time-compressing speech: An exploratory study. In Proc of ICASSP 2005, (2005).Google ScholarCross Ref
- Vemuri, S., DeCamp, P., Bender, W. and Schmandt, C. Improving speech playback using time-compression and speech recognition. In Proc. CHI 2004, (2004), 295--302. Google ScholarDigital Library
- Voorhees, E.M. and Buckland, L.P. The Thirteenth Text REtrieval Conference Proceedings. NIST Special Publication, (2004).Google Scholar
- Walker, M., Prasad, R. and Stent, A. A trainable generator for recommendations in multimodal dialog. In EUROSPEECH: European Conference on Speech Processing, (2003), 1697--1701.Google Scholar
- Wellner, P., Flynn, M., Tucker, S. and Whittaker, S. A meeting browser evaluation test. In Proc. CHI 2005, (2005). Google ScholarDigital Library
- Whittaker, S., Hirschberg, J., Amento, B., Stark, L., Bacchiani, M., Isenhour, P., Stead, L., Zamchick, G. and Rosenberg, A. SCANMail: A voicemail interface that makes speech browsable, readable and searchable. In Proc. CHI 2002, (2002), 275--282. Google ScholarDigital Library
- Whittaker, S., and Amento, B. Semantic speech editing. In Proc. CHI 2004, (2004), 527--534. Google ScholarDigital Library
Index Terms
- Time is of the essence: an evaluation of temporal compression algorithms
Recommendations
Toward Robust Speech Recognition and Understanding
The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods that ...
Recent Progress in Corpus-Based Spontaneous Speech Recognition
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in ...
Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer
IC3K 2016: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge ManagementThis paper addresses speech summarization of highly spontaneous speech. The audio signal is transcribed using
an Automatic Speech Recognizer, which operates at relatively high word error rates due to the complexity
of the recognition task and high ...
Comments