Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research

Panapana, Pooja; Pothala, Eswara Rao; Nagireddy, Sai Sri Lakshman; Mattaparthi, Hemendra Praneeth; Meesala, Niranjani

doi:10.1007/978-3-031-35501-1_30

Pooja Panapana¹⁴,
Eswara Rao Pothala¹⁴,
Sai Sri Lakshman Nagireddy¹⁴,
Hemendra Praneeth Mattaparthi¹⁴ &
…
Niranjani Meesala¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 716))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

377 Accesses

Abstract

Speech represents the most natural and basic method of communication for living beings. Speech provides the most direct and natural way for humans, and even humans and machines, to communicate. People who do not have disabilities can converse with each other in natural language, however people who have disabilities, such as Deafness or Dumbness, can only communicate by texting and sign language. But one can use sign language when the other person is near to us. Speech detection/recognition is a segment of computer science which allows the computer to recognize and translate spoken language into text. Speech detection technology gives machines the ability to identify and respond to spoken commands. If we need to send any information, we can make audio and send it to them. Every time we speak or play audio, it consists of some signals. These signals are used to make communication between humans and machines. The current systems can only have applications on speech to text conversion. The proposed system tries to implement more by converting audio to text and as well as text to speech which are more useful. This project will aid in the conversion of audio to manuscript and manuscript to speech. This project also translates the languages which is helpful for illiterate people too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Developing a Framework for Acquisition and Analysis of Speeches

Automatic Speech Recognition for Moroccan Dialects: A Review

Multilingual Speech Recognition: An In-Depth Review of Applications, Challenges, and Future Directions

References

Trivedi, A., Pant, N., Shah, P., Sonik, S., Agrawal, S.: Speech to text and text to speech recognition systems-a review. IOSR J. Comput. Eng. 20(2), 36–43 (2018)
Google Scholar
Shakhovska, N., Basystiuk, O., Shakhovska, K.: Development of the speech-to- text chatbot interface based on google API. In: MoMLeT, pp. 212–221 (2019)
Google Scholar
Benkerzaz, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. J. Inf. Technol. Rev. 10(3), 80–83 (2019)
Google Scholar
Thiruvengatanadhan, R.: Speech recognition using SVM. Int. Res. J. Eng. Technol. (IRJET) 5(9), 918–921 (2018)
Google Scholar
Basystiuk, O., et al.: The developing of the system for automatic audio to text conversion. In: IT&AS, pp. 1–8 (2021)
Google Scholar
Tsap, V., Shakhovska, N., Sokolovskyi, I.: The developing of the system for automatic audio to text conversion. In: MoMLeT+DS, pp. 75–84 (2021)
Google Scholar
Tjandra, A., Sakti, S., Nakamura, S.: Machine speech chain. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 976–989 (2020)
Article Google Scholar
Anidjar, O.H., Lapidot, I., Hajaj, C., Dvir, A., Gilad, I.: Hybrid speech and text analysis methods for speaker change detection. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2324–2338 (2021)
Article Google Scholar
Ren, Y., et al.: Fastspeech: fast, robust and controllable text to speech. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Thiruvengatanadhan, R.: Speech recognition using sonogram and AANN (2019)
Google Scholar
Bain, K., Basson, S.H., Wald, M.: Speech recognition in university classrooms: liberated learning project. In: Proceedings of the Fifth International ACM Conference on Assistive Technologies, July 2002
Google Scholar
Kumar, N., Narang, A., Lall, B.: Zero-shot normalization driven multi- speaker text to speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1679–1693 (2022)
Article Google Scholar
Novitasari, S., Sakti, S., Nakamura, S.: A machine speech chain approach for dynamically adaptive Lombard TTS in static and dynamic noise environments. IEEE/ACM Trans. Audio Speech Lang. Process. (2022)
Google Scholar
Zheng, Y., Tao, J., Wen, Z., Yi, J.: Forward–backward decoding sequence for regularizing end-to-end tts. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2067–2079 (2019)
Article Google Scholar
Valentini-Botinhao, C., Yamagishi, J.: Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans. Audio Speech Lang. Process. 26(8), 1420–1433 (2018)
Article Google Scholar
Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2015)
Google Scholar
Babu Pandipati, D.R.: Speech to text conversion using deep learningneural net methods. Turkish J. Comput. Math. Educ. (TURCOMAT), 12(5), 2037–2042 (2021)
Google Scholar
Nadig, P.P.S., Pooja, G., Kavya, D., Chaithra, R., Radhika, A.D.: Survey on text-to-speech Kannada using neural networks. Int. J. Adv. Res. Ideas Innov. Technol. 5(6), 128 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

GMR Institute of Technology, GMR Nagar, Rajam, 532127, India
Pooja Panapana, Eswara Rao Pothala, Sai Sri Lakshman Nagireddy, Hemendra Praneeth Mattaparthi & Niranjani Meesala

Authors

Pooja Panapana
View author publications
You can also search for this author in PubMed Google Scholar
Eswara Rao Pothala
View author publications
You can also search for this author in PubMed Google Scholar
Sai Sri Lakshman Nagireddy
View author publications
You can also search for this author in PubMed Google Scholar
Hemendra Praneeth Mattaparthi
View author publications
You can also search for this author in PubMed Google Scholar
Niranjani Meesala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pooja Panapana .

Editor information

Editors and Affiliations

Faculty of Computing and Data Science, FLAME University, Pune, Maharashtra, India
Ajith Abraham
Center for Smart Computing Continuum, Burgenland, Austria
Sabri Pllana
University of Bari, Bari, Italy
Gabriella Casalino
University of Jinan, Jinan, Shandong, China
Kun Ma
Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
Anu Bajaj

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Panapana, P., Pothala, E.R., Nagireddy, S.S.L., Mattaparthi, H.P., Meesala, N. (2023). Automatic Bidirectional Conversion of Audio and Text: A Review from Past Research. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-031-35501-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-35501-1_30
Published: 03 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35500-4
Online ISBN: 978-3-031-35501-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics