Loading [a11y]/accessibility-menu.js
AI-enabled Audio and Chat Collaboration Services | IEEE Conference Publication | IEEE Xplore

Abstract:

In this paper we investigate an approach to improve audio services for use at the tactical edge where networks can be characterized as Disconnected, Intermittent and Limi...Show More

Abstract:

In this paper we investigate an approach to improve audio services for use at the tactical edge where networks can be characterized as Disconnected, Intermittent and Limited (DIL). We look at using newer artificial intelligence speech recognition systems, namely Vosk and OpenAI’s Whisper, to bring transcription functionality to the services. Allowing services to convert voice audio to text will reduce the strain on the network, which is an important aspect to consider in DIL environments.To demonstrate our approach to improve audio services, we introduce a speech-to-text (STT) application that implements both Vosk and Whisper as transcriber modules. The application builds on a technology stack with three parts that includes transcription, messaging and Voice over IP. In addition to having STT functionality, we also implement the reverse: a text-to-speech module that translates a text message back to audio for the recipient.The paper discusses the design and architecture of the application, detailing how the technology stack is built using a set of technologies that benefit audio services that are used in DIL networks. The application needs to work at the tactical edge where resources are sparse, and we therefore evaluate the implemented transcribers with regard to resource use. Finally, we investigate the accuracy of both transcribers to assess the quality they deliver.
Date of Conference: 28 October 2024 - 01 November 2024
Date Added to IEEE Xplore: 06 December 2024
ISBN Information:

ISSN Information:

Conference Location: Washington, DC, USA

Contact IEEE to Subscribe

References

References is not available for this document.