demonstration

VisPod: Content-Based Audio Visual Navigation

Authors:
Qiyu Zhi

University of Notre Dame, Notre Dame, USA

University of Notre Dame, Notre Dame, USA
View Profile

,
Suwen Lin

University of Notre Dame, Notre Dame, USA

University of Notre Dame, Notre Dame, USA
View Profile

,
Shuai He

University of Notre Dame, Notre Dame, USA

University of Notre Dame, Notre Dame, USA
View Profile

,
Ronald Metoyer

University of Notre Dame, Notre Dame, USA

University of Notre Dame, Notre Dame, USA
View Profile

,
Nitesh V. Chawla

University of Notre Dame, Notre Dame, USA

University of Notre Dame, Notre Dame, USA
View Profile

IUI '18 Companion: Companion Proceedings of the 23rd International Conference on Intelligent User InterfacesMarch 2018Article No.: 10Pages 1–2https://doi.org/10.1145/3180308.3180318

Published:05 March 2018Publication History

IUI '18 Companion: Companion Proceedings of the 23rd International Conference on Intelligent User Interfaces

Pages 1–2

ABSTRACT

Current audio player interfaces generally provide brief information such as title and duration time and support basic playback control functions. These features alone are not sufficient for certain user tasks, such as quickly finding a previously-visited location or browsing the main topics covered in the audio content. We present VisPod, a visual audio player that visually displays the main topics and keywords extracted from the transcript. VisPod supports (1) audio content browsing, (2) topic-based and keyword-based navigation, (3) communication of transcript and speaker information in real time, and (4) content-based query. VisPod encodes audio as a donut chart comprised of topic segments, and uses text processing algorithms to segment the transcript into independent topics and utilizes a deep learning model to generate human-readable topic names. An informal study suggests users prefer VisPod over traditional audio playback approaches specifically with regards to its benefits for audio browsing and navigation.

References

Marti A Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics 23, 1 (1997), 33--64. Google ScholarDigital Library
Konstantin Lopyrev. 2015. Generating news headlines with recurrent neural networks. arXiv preprint arXiv:1512.01712 (2015).Google Scholar
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text.. In EMNLP, Vol. 4. 404--411.Google Scholar
Alex Rudnicky. 2010. Sphinx knowledge base tool. (2010). http://www.speech.cs.cmu.edu/tools/lmtool.htmlGoogle Scholar
Jiahong Yuan and Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.Google ScholarCross Ref

Index Terms

VisPod: Content-Based Audio Visual Navigation
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

SpeechSkimmer: a system for interactively skimming recorded speech
Special issue on speech as data

Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored ...
Read More
Working with audio: integrating personal tape recorders and desktop computers
CHI '92: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Audio data is rarely used on desktop computers today, although audio is otherwise widely used for communication tasks. This paper describes early work aimed at creating computer tools that support the ways users may want to work with audio data. User ...
Read More
MuseFlow: music accompaniment generation based on flow
Abstract
Arranging and orchestration are critical aspects of music composition and production. Traditional accompaniment arranging is time-consuming and requires expertise in music theory. In this work, we utilize a deep learning model, the flow model, to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IUI '18 Companion: Companion Proceedings of the 23rd International Conference on Intelligent User Interfaces
March 2018
141 pages
ISBN:9781450355711
DOI:10.1145/3180308

Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 March 2018
Check for updates
Author Tags
Audio Browsing
Audio Navigating
Deep Learning
Topic Generation
Topic Separation
Qualifiers
- demonstration
- Research
- Refereed limited
Conference

Acceptance Rates
IUI '18 Companion Paper Acceptance Rate63of127submissions,50%Overall Acceptance Rate746of2,811submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 101
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VisPod: Content-Based Audio Visual Navigation

IUI '18 Companion: Companion Proceedings of the 23rd International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

SpeechSkimmer: a system for interactively skimming recorded speech

Working with audio: integrating personal tape recorders and desktop computers

MuseFlow: music accompaniment generation based on flow

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

VisPod: Content-Based Audio Visual Navigation

IUI '18 Companion: Companion Proceedings of the 23rd International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

SpeechSkimmer: a system for interactively skimming recorded speech

Working with audio: integrating personal tape recorders and desktop computers

MuseFlow: music accompaniment generation based on flow

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media