Skip to main content

Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2022)

Abstract

Speech represents the most natural and basic method of communication for living beings. Speech provides the most direct and natural way for humans, and even humans and machines, to communicate. People who do not have disabilities can converse with each other in natural language, however people who have disabilities, such as Deafness or Dumbness, can only communicate by texting and sign language. But one can use sign language when the other person is near to us. Speech detection/recognition is a segment of computer science which allows the computer to recognize and translate spoken language into text. Speech detection technology gives machines the ability to identify and respond to spoken commands. If we need to send any information, we can make audio and send it to them. Every time we speak or play audio, it consists of some signals. These signals are used to make communication between humans and machines. The current systems can only have applications on speech to text conversion. The proposed system tries to implement more by converting audio to text and as well as text to speech which are more useful. This project will aid in the conversion of audio to manuscript and manuscript to speech. This project also translates the languages which is helpful for illiterate people too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Trivedi, A., Pant, N., Shah, P., Sonik, S., Agrawal, S.: Speech to text and text to speech recognition systems-a review. IOSR J. Comput. Eng. 20(2), 36–43 (2018)

    Google Scholar 

  2. Shakhovska, N., Basystiuk, O., Shakhovska, K.: Development of the speech-to- text chatbot interface based on google API. In: MoMLeT, pp. 212–221 (2019)

    Google Scholar 

  3. Benkerzaz, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. J. Inf. Technol. Rev. 10(3), 80–83 (2019)

    Google Scholar 

  4. Thiruvengatanadhan, R.: Speech recognition using SVM. Int. Res. J. Eng. Technol. (IRJET) 5(9), 918–921 (2018)

    Google Scholar 

  5. Basystiuk, O., et al.: The developing of the system for automatic audio to text conversion. In: IT&AS, pp. 1–8 (2021)

    Google Scholar 

  6. Tsap, V., Shakhovska, N., Sokolovskyi, I.: The developing of the system for automatic audio to text conversion. In: MoMLeT+DS, pp. 75–84 (2021)

    Google Scholar 

  7. Tjandra, A., Sakti, S., Nakamura, S.: Machine speech chain. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 976–989 (2020)

    Article  Google Scholar 

  8. Anidjar, O.H., Lapidot, I., Hajaj, C., Dvir, A., Gilad, I.: Hybrid speech and text analysis methods for speaker change detection. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2324–2338 (2021)

    Article  Google Scholar 

  9. Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  10. Thiruvengatanadhan, R.: Speech recognition using sonogram and AANN (2019)

    Google Scholar 

  11. Bain, K., Basson, S.H., Wald, M.: Speech recognition in university classrooms: liberated learning project. In: Proceedings of the Fifth International ACM Conference on Assistive Technologies, July 2002

    Google Scholar 

  12. Kumar, N., Narang, A., Lall, B.: Zero-shot normalization driven multi- speaker text to speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1679–1693 (2022)

    Article  Google Scholar 

  13. Novitasari, S., Sakti, S., Nakamura, S.: A machine speech chain approach for dynamically adaptive Lombard TTS in static and dynamic noise environments. IEEE/ACM Trans. Audio Speech Lang. Process. (2022)

    Google Scholar 

  14. Zheng, Y., Tao, J., Wen, Z., Yi, J.: Forward–backward decoding sequence for regularizing end-to-end tts. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2067–2079 (2019)

    Article  Google Scholar 

  15. Valentini-Botinhao, C., Yamagishi, J.: Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1420–1433 (2018)

    Article  Google Scholar 

  16. Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2015)

    Google Scholar 

  17. Babu Pandipati, D.R.: Speech to text conversion using deep learningneural net methods. Turkish J. Comput. Math. Educ. (TURCOMAT), 12(5), 2037–2042 (2021)

    Google Scholar 

  18. Nadig, P.P.S., Pooja, G., Kavya, D., Chaithra, R., Radhika, A.D.: Survey on text-to-speech Kannada using neural networks. Int. J. Adv. Res. Ideas Innov. Technol. 5(6), 128 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pooja Panapana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Panapana, P., Pothala, E.R., Nagireddy, S.S.L., Mattaparthi, H.P., Meesala, N. (2023). Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-031-35501-1_30

Download citation

Publish with us

Policies and ethics