Skip to main content

Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1879))

Abstract

Aiming at complex and changeable factors such as speech theme and environment, which make it difficult for a speaker to prepare the speech text in a short time, this paper proposes a speech generation and demonstration system based on deep learning. This system is based on the Deep Learning Development Framework (PyTorch), trained through the theory of GPT-2 and the open source pretrained model, to generate multiple speeches according to the topics given by users, and the system generates the final speech and corresponding voice demonstration audio through text modification, speech synthesis and other technologies to help users quickly obtain the target document and audio. Experiments show that the text generated by this model is smooth and easy to use, which helps shorten the preparation time of speakers and improves the confidence of the impromptu speaker. In addition, the paper explores the application prospects of text generation and has certain reference value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hu, M.: Text generation based on generative adversarial networks. University of TC (2020)

    Google Scholar 

  2. Ma, Z.: Fairy tale text generation study based on the improved GPT-2 model. Shanghai Normal University (2020)

    Google Scholar 

  3. Le, Q.: Distributed representations of sentences and documents. Journal PMLR (2014)

    Google Scholar 

  4. Gardner, M., Grus, J., Neumann, M., et al.: AllenNLP: a deep semantic natural language processing platform. Journal (2018)

    Google Scholar 

  5. Ghanem, R., Erbay, H.: Spam detection on social networks using deep contextualized word representation. J. 82(3), 3697–3712 (2023)

    Google Scholar 

  6. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention Is All You Need. Journal arXiv (2017)

    Google Scholar 

  7. A R. Improving language understanding by generative pretraining. Journal (2018)

    Google Scholar 

  8. A R. Language models are unsupervised multitask learners. J. OpenAI blog 1(8) (2019)

    Google Scholar 

  9. T B. Language models are few-shot learners. Advances in neural information processing systems (2020)

    Google Scholar 

  10. Y W. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXi (2017)

    Google Scholar 

  11. Y R. Fastspeech: Fast, robust and controllable text to speech. Advances in neural information processing systems (2019)

    Google Scholar 

  12. Jiao, Y., Gabrys, A., et al.: Universal neural vocoding with parallel wavenet. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6044–6048 (2021)

    Google Scholar 

  13. Amoto, R., Song, E., Kim, J.M.: Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multiresolution spectrogram//ICASS, (ICASSP). IEEE (ICASSP) (2020)

    Google Scholar 

  14. H L. VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis. Journal arXiv preprint arXi (2021)

    Google Scholar 

  15. Sun, F.: Study on the automatic generation method of scientific consulting report based on GPT-2 model. Xidian University (2021)

    Google Scholar 

  16. Wang, Z.: Research and implementation of speech synthesis based on Fastspeech. Sichuan University of Light and Chemical Technology (2021)

    Google Scholar 

  17. Chen, F.: Multidiscriminant song synthetic vocoders based on generative adversarial networks. Zhejiang University (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanqing Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, W., Wang, Y., Liu, Y., Xu, Y. (2023). Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning. In: Yu, Z., et al. Data Science. ICPCSEE 2023. Communications in Computer and Information Science, vol 1879. Springer, Singapore. https://doi.org/10.1007/978-981-99-5968-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-5968-6_33

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-5967-9

  • Online ISBN: 978-981-99-5968-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics