ABSTRACT
Current audio player interfaces generally provide brief information such as title and duration time and support basic playback control functions. These features alone are not sufficient for certain user tasks, such as quickly finding a previously-visited location or browsing the main topics covered in the audio content. We present VisPod, a visual audio player that visually displays the main topics and keywords extracted from the transcript. VisPod supports (1) audio content browsing, (2) topic-based and keyword-based navigation, (3) communication of transcript and speaker information in real time, and (4) content-based query. VisPod encodes audio as a donut chart comprised of topic segments, and uses text processing algorithms to segment the transcript into independent topics and utilizes a deep learning model to generate human-readable topic names. An informal study suggests users prefer VisPod over traditional audio playback approaches specifically with regards to its benefits for audio browsing and navigation.
- Marti A Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64. Google ScholarDigital Library
- Konstantin Lopyrev. 2015. Generating news headlines with recurrent neural networks. arXiv preprint arXiv:1512.01712 (2015).Google Scholar
- Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text.. In EMNLP, Vol. 4. 404--411.Google Scholar
- Alex Rudnicky. 2010. Sphinx knowledge base tool. (2010). http://www.speech.cs.cmu.edu/tools/lmtool.htmlGoogle Scholar
- Jiahong Yuan and Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.Google ScholarCross Ref
Index Terms
- VisPod: Content-Based Audio Visual Navigation
Recommendations
SpeechSkimmer: a system for interactively skimming recorded speech
Special issue on speech as dataListening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored ...
Working with audio: integrating personal tape recorders and desktop computers
CHI '92: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsAudio data is rarely used on desktop computers today, although audio is otherwise widely used for communication tasks. This paper describes early work aimed at creating computer tools that support the ways users may want to work with audio data. User ...
MuseFlow: music accompaniment generation based on flow
AbstractArranging and orchestration are critical aspects of music composition and production. Traditional accompaniment arranging is time-consuming and requires expertise in music theory. In this work, we utilize a deep learning model, the flow model, to ...
Comments