Article

Time is of the essence: an evaluation of temporal compression algorithms

Authors:
Simon Tucker

Sheffield University, UK, Sheffield, UK

Sheffield University, UK, Sheffield, UK
View Profile

,
Steve Whittaker

Sheffield University, UK, Sheffield, UK

Sheffield University, UK, Sheffield, UK
View Profile

CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsApril 2006Pages 329–338https://doi.org/10.1145/1124772.1124822

Published:22 April 2006Publication History

CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 329–338

ABSTRACT

Although speech is a potentially rich information source, a major barrier to exploiting speech archives is the lack of useful tools for efficiently accessing lengthy speech recordings. This paper develops and evaluates techniques for temporal compression - reducing the time people take to listen to a recording while still extracting critical information. We first describe an exploratory study that identifies novel excision techniques that remove unimportant words or utterances from the recording. We then develop a new method for evaluating how well temporal compression supports users in forming a general understanding of a recording. Applying this method, we demonstrate that excision techniques are generally more effective than standard compression techniques that simply speed up the entire recording.

References

AMI Project. http://www.amiproject.org/Google Scholar
Arons, B. SpeechSkimmer: A system for interactively skimming recorded speech. ACM Trans. Computer-Human Interaction 4, 1 (1997), 3--38. Google ScholarDigital Library
Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
Beasley, D.S. and Maki, J.E. Time and frequency altered speech. In Contemporary Issues in Experimental Phonetics, Academic Press, (1976), 419--458.Google Scholar
Chalfonte, B.L., Fish, R.S. and Kraut, R. Expressive richness: A comparison of speech and text as Media for Revision. Proc. CHI 1991, (1991), 21--26. Google ScholarDigital Library
Covell, M., Withgott, M. and Slaney, M. Mach1: Nonuniform time-scale modification of speech. Proc. IEEE ICASSP 1998, (1998), 493--496.Google ScholarCross Ref
Cutler, R., Rui, Y., Gupta, A. Cadiz, J.J. Tashev, I., He, L., Colburn, A., Zhang, Z., Liu, Z. and Silverberg, S. Distributed meetings: A meeting capture and broadcasting system. Proc. 10th ACM International Conf on Multimedia, (2002), 503--512. Google ScholarDigital Library
Garofolo, J., Auzanne, C.G.P. and Voorhees, E.M. The TREC-9 spoken document retrieval track: A success story. Proc. RIAO-2000, (2000).Google Scholar
Hays, W.L. Statistics for the Social Sciences. Holt, Rinehart and Winston, 1973.Google Scholar
He, L. and Gupta, A. User benefits of non-linear time compression. Microsoft Research Technical Report MSR-TR-2000-96, Microsoft, (2000).Google Scholar
Hejna, D. Real-time time-scale modification of speech via the synchronized overlap-add algorithm. MSc Dissertation, M.I.T., (1990).Google Scholar
Hori, C. and Furui, S. A new approach to automatic speech summarization. IEEE Trans. Multimedia 5, 3 (2003), 368--378. Google ScholarDigital Library
Lin, C-W. ROUGE: A package for automatic evaluation of summaries. Proceedings of ACL 2004, (2004), 56--60.Google Scholar
McKeown, K., Hirschberg, J., Galley, M. and Maskey, S. From text to speech summarization. In Proc. of ICASSP 2005, (2005).Google ScholarCross Ref
MLMI 2005. http://groups.inf.ed.ac.uk/mlmi05/techprog.html.Google Scholar
Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E. and Stolcke, A. The meeting project at ICSI. Proc. HLT Conference, (2001), 246--252. Google ScholarDigital Library
Nenkova, A. and Passonneau, R. Evaluating content selection in summarization: the pyramid model. In Proc HLT-NAACL 2004, (2004), 145--152.Google Scholar
Sticht, T.G. Comprehension of repeated time-compression recordings. Journal of Experimental Education 37, 4 (1969).Google ScholarCross Ref
Stifelman, L. Augmenting real-world objects: A paper-based audio notebook. In Proc. CHI 1996, (1996), 199--200. Google ScholarDigital Library
Tucker, S. and Whittaker, S. Accessing multimodal meeting data: systems, problems and possibilities. In Lecture Notes in Computer Science 3361, (2005), 1--11. Google ScholarDigital Library
Tucker, S. and Whittaker, S. Novel techniques for time-compressing speech: An exploratory study. In Proc of ICASSP 2005, (2005).Google ScholarCross Ref
Vemuri, S., DeCamp, P., Bender, W. and Schmandt, C. Improving speech playback using time-compression and speech recognition. In Proc. CHI 2004, (2004), 295--302. Google ScholarDigital Library
Voorhees, E.M. and Buckland, L.P. The Thirteenth Text REtrieval Conference Proceedings. NIST Special Publication, (2004).Google Scholar
Walker, M., Prasad, R. and Stent, A. A trainable generator for recommendations in multimodal dialog. In EUROSPEECH: European Conference on Speech Processing, (2003), 1697--1701.Google Scholar
Wellner, P., Flynn, M., Tucker, S. and Whittaker, S. A meeting browser evaluation test. In Proc. CHI 2005, (2005). Google ScholarDigital Library
Whittaker, S., Hirschberg, J., Amento, B., Stark, L., Bacchiani, M., Isenhour, P., Stead, L., Zamchick, G. and Rosenberg, A. SCANMail: A voicemail interface that makes speech browsable, readable and searchable. In Proc. CHI 2002, (2002), 275--282. Google ScholarDigital Library
Whittaker, S., and Amento, B. Semantic speech editing. In Proc. CHI 2004, (2004), 527--534. Google ScholarDigital Library

Index Terms

Time is of the essence: an evaluation of temporal compression algorithms
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Toward Robust Speech Recognition and Understanding

The principal cause of speech recognition errors is a mismatch between trained acoustic/language models and input speech due to the limited amount of training data in comparison with the vast variation of speech. It is crucial to establish methods that ...
Read More
Recent Progress in Corpus-Based Spontaneous Speech Recognition

This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in ...
Read More
Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer
IC3K 2016: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

This paper addresses speech summarization of highly spontaneous speech. The audio signal is transcribed using

an Automatic Speech Recognizer, which operates at relatively high word error rates due to the complexity

of the recognition task and high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2006
1353 pages
ISBN:1595933727
DOI:10.1145/1124772
Editors:
Rebecca Grinter
Georgia Institute of Technology, USA
,
Thomas Rodden
University of Nottingham, UK
,
Paul Aoki
Intel, USA
,
Ed Cutrell
Microsoft, USA
,
Robin Jeffries
Google, USA
,
Gary Olson
University of Michigan, USA
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio interfaces
evaluation methods
excision
meetings interfaces
speech manipulation
speech summary
speech-as-data
speed-up
summarization
temporal compression
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 557
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Time is of the essence: an evaluation of temporal compression algorithms

CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward Robust Speech Recognition and Understanding

Recent Progress in Corpus-Based Spontaneous Speech Recognition

Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer