Journals & Magazines >IEEE Signal Processing Letters >Volume: 31

Generating Accurate and Diverse Audio Captions Through Variational Autoencoder Framework

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Generating both diverse and accurate descriptions is an essential goal in the audio captioning task. Traditional methods mainly focus on improving the accuracy of the gen...Show More

Metadata

Abstract:

Generating both diverse and accurate descriptions is an essential goal in the audio captioning task. Traditional methods mainly focus on improving the accuracy of the generated captions but ignore their diversity. In contrast, recent methods have considered generating diverse captions for a given audio clip, but with the potential trade-off in caption accuracy. In this work, we propose a new diverse audio captioning method based on a variational autoencoder structure, dubbed AC-VAE, aiming to achieve a better trade-off between the diversity and accuracy of the generated captions. To improve diversity, AC-VAE learns the latent word distribution at each location based on contextual information. To uphold accuracy, AC-VAE incorporates an autoregressive prior module and a global constraint module, which enable precise modeling of word distribution and encourage semantic consistency of captions at the sentence level. We evaluate the proposed AC-VAE on the Clotho dataset. Experimental results show that AC-VAE achieves a better trade-off between diversity and accuracy compared to the state-of-the-art methods.

Published in: IEEE Signal Processing Letters ( Volume: 31)

Page(s): 2520 - 2524

Date of Publication: 04 June 2024

ISSN Information:

DOI: 10.1109/LSP.2024.3409212

Funding Agency:

Contents

References is not available for this document.

Generating Accurate and Diverse Audio Captions Through Variational Autoencoder Framework

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Generating Accurate and Diverse Audio Captions Through Variational Autoencoder Framework

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?