ABSTRACT
The institutional memory of enterprises is increasingly comprised of digital multimedia content, such as online lecture videos and presentations, archived meetings or conference calls, and voicemail. A key technology for efficiently managing such content is keyword search into the spoken audio content using automatic speech recognition (ASR).
A key learning for deploying ASR-based indexing in enterprises is that multimedia content is often not stored in a centralized hosting application, but in a "long tail' of small teams' intranet sites, often built by technology enthusiasts who like to tinker and make creative use of technology. This calls for an indexing platform rather than a standalone app, audio indexing being one feature, easy to deploy with limited IT skills in a "do-it-yourself"-manner, and integrating with the existing information-management infrastructure.
We will present approaches to three enterprise-characteristic challenges arising from these requirements: (1) Probabilistic indexing of word lattices instead of speech-to-text transcripts, to address the limited recognition accuracy (often in the 50% range due to lack of matching acoustic/domain corpora); (2) phonetic search and vocabulary adaptation for indexing person names, domain terminology, and code names missing in a standard recognizer; and (3) approximations to implement probabilistic lattice indexing on top of existing industry-strength full-text search engines, for maximal reuse and integration with existing tools and deployments to reduce cost, and to enable non-speech experts to manage and operate indexing/search system and build/mesh-up line-of-business applications around it.
- M. Saraclar, R. Sproat. Lattice-based search for spoken utterance retrieval. Proc. HLT'2004, Boston, 2004.Google Scholar
- P. Yu, K. J. Chen, C. Y. Ma, F. Seide, Vocabulary-independent indexing of spontaneous speech, IEEE Trans. SAP, Vol.13, No.5.Google Scholar
- P. Yu, K. Thambiratnam, F. Seide, Word-lattice based spoken--document indexing with standard text indexers, SIGIR'08 SSCS Workshop, Singapore, 2008.Google Scholar
Index Terms
- Multimedia retrieval through indexing speech: an enterprise perspective
Recommendations
Improving Acoustic Models with Captioned Multimedia Speech
ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2Speech recognition can be used to create searchable transcripts for audio indexing in digital video libraries. Large amounts of hand-transcribed speech training data are required to build or improve acoustic models of highly accurate speech recognition ...
Improving Acoustic Models with Captioned Multimedia Speech
ICMCS '99: Proceedings of the 1999 IEEE International Conference on Multimedia Computing and Systems - Volume 02Speech recognition can be used to create searchable transcripts for audio indexing in digital video libraries. Large amounts of hand-transcribed speech training data are required to build or improve acoustic models of highly accurate speech recognition ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System
Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Comments