ABSTRACT
Word clouds are extensively used to present a summary of the prominent words in a document on the World Wide Web. Such clouds give the user an idea about the content of the document. In this paper we present a mechanism to create and render an audio cloud for audio content. Such audio clouds are expected to provide a similar summary of the audio documents. They have wide applicability in various domains, especially for low-literate users who currently do not use the Internet but interact with audio-based systems.
Detecting words from an audio content is challenging, especially if the audio is in languages for which a speech recognition system does not exist. We present a language-independent mechanism to detect frequently occurring words within an audio document. We then present four ways to render these words that form an audio cloud. The four prototypes for rendering the audio cloud are based on varying the amplitude, the voice quality, echo and the repetition of audio words. An evaluation study conducted across 32 users suggests that literate and low-literate users easily understand the concept of audio cloud.
- Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., and Mohri, M. Openfst: A general and efficient weighted finite-state transducer library. CIAA (2007), 11--23. Google ScholarDigital Library
- Furnas, G. W., Fake, C., von Ahn, L., Schachter, J., Golder, S., Fox, K., Davis, M., Marlow, C., and Naaman, M. Why do tagging systems work? In CHI'06 extended abstracts, CHI EA '06 (2006), 36--39. Google ScholarDigital Library
- Internet Usage World Wide by Country. http://www.infoplease.com/ipa/a0933606.html, Last accessed on October 10, 2011.Google Scholar
- Legg, L., and Gilbert, P. A pilot study of gender of voice and gender of voice hearer in psychotic voice hearers. Psychology and Psychotherapy: Theory, Research and Practice (2006), 517--527.Google Scholar
- Liddy, E. Advances in automatic text summarization. Inf. Retr. 4 (April 2001), 82--83. Google ScholarDigital Library
- Marzano, R. J. A theory-based meta-analysis of research on instruction. Mid-continent Aurora, Colorado: Regional Educational Laboratory. (2000).Google Scholar
- Miller, G. A. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review (1956), 81--97.Google Scholar
- Parada, C., Sethy, A., and Ramabhadran, B. Query-by-example spoken term detection for oov terms. Proc. of Automatic Speech Recognition and Understanding (2009).Google ScholarCross Ref
- Tusing, K., and Dillard, J. The sounds of dominance. Human Communication Research 26, 1 (2000), 148--171.Google ScholarCross Ref
- UNESCO Institute for Statistics. Global education digest 2010: Comparing education statistics across the world, 2010.Google Scholar
- Vigas, A. B., Wattenberg, M., and Feinberg, J. Participatory visualization with wordle. IEEE Transactions on Visualization and Computer Graphics 15 (2009). Google ScholarDigital Library
Index Terms
- Audio cloud: creation and rendering
Recommendations
From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass
We present the open-source implementation of the first fully automatic and comprehensive DJ system, able to generate seamless music mixes using songs from a given library much like a human DJ does.
The proposed system is built on top of several enhanced ...
Marble track audio manipulator (MTAM): a tangible user interface for audio composition
TEI '08: Proceedings of the 2nd international conference on Tangible and embedded interactionWe created a tangible user interface that allows children to create musical compositions through constructive play. Our Marble Track Audio Manipulator (MTAM) is an augmented marble tower construction kit where marbles represent sound clips and tracks ...
Comments