research-article

Build A Module for Improvement Real Time Speech enhancement using Long Short-term Memory Approach: Improvement Real Time Speech enhancement using Long Short-term Memory

Authors:
Van Vo

Software Engineering Department, FPT University, Vietnam

Software Engineering Department, FPT University, Vietnam

0000-0002-5254-2800
View Profile

,
Bach Son Le

Information Technology Specialized Department, FPT University, Vietnam

Information Technology Specialized Department, FPT University, Vietnam

0009-0002-6713-4903
View Profile

,
Huy Phuc Vo

Information Technology Specialized Department, FPT University, Vietnam

Information Technology Specialized Department, FPT University, Vietnam

0009-0001-9728-7030
View Profile

,
Huong Thi Cam Nguyen

Software Engineering Department, FPT University, Vietnam

Software Engineering Department, FPT University, Vietnam

0009-0009-1275-740X
View Profile

,
Phuong Huu Khanh Lam

Software Engineering Department, FPT University, Vietnam

Software Engineering Department, FPT University, Vietnam

0009-0002-4763-1777
View Profile

ICIIT '23: Proceedings of the 2023 8th International Conference on Intelligent Information TechnologyFebruary 2023Pages 259–264https://doi.org/10.1145/3591569.3591614

Published:13 July 2023Publication History

ICIIT '23: Proceedings of the 2023 8th International Conference on Intelligent Information Technology

Pages 259–264

ABSTRACT

An essential customer experience is required for all businesses today, and customer support as a service brings the right people and processes together. When designing a system for in the context of audio communication for transmission purposes, noise influences must be carefully considered. Improving the quality of phone calls for a smart virtual call center is essential for more effective customer care. This paper proposed a module for improving real-time speech enhancement of phone calls using Long short-term memory (LSTM), an artificial neural network used in the fields of artificial intelligence and deep learning. LSTMs are designed to revoke the long-term dependency issue, remembering information for long periods is generally their default way of behaving. The data set using for this approach is both in English and Vietnamese, the results also improve with evaluation metrics such as PESQ, SI-SDR, STOI.

References

Kumar V. and Werner R. 2018. Customer Relationship Management. Springer.Google Scholar
Gillian M. Davis. 2002. Noise Reduction in Speech Applications. CRC Press.Google Scholar
Lim J.S. and Oppenheim A.V. 1979. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE 67, 1586-1979. DOI:https://doi.org/10.1109/PROC.Google ScholarCross Ref
Welch, G. & Bishop, G. 1995. An Introduction to the Kalman Filter. Technical report, University of North Carolina at Chapel Hill , University of North Carolina at Chapel Hill , Chapel Hill, NC, USA .Google Scholar
Oswald Campesato. 2020. Chapter 4, 5. Artificial Intelligence, Machine Learning, and Deep Learning. Mercury Learning and Information.Google Scholar
Wang, DeLiang and Chen, Jitong. 2017. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 1702-1726. DOI:https://doi.org/10.1109/TASLP.2018.2842159Google ScholarDigital Library
Prajna Kunche and N. Manikanthababu. 2020. Fractional Fourier Transform Techniques for Speech Enhancement. Springer Nature, 22-50.Google Scholar
Firdauzi, Anugerah, Wirianto, Kiki, Arijal, Muhammad, and Adiono, Trio. 2013. Design and Implementation of Real Time Noise Cancellation System based on Spectral Subtraction Method. Procedia Technology 11, 100-1010. DOI:https://doi.org/10.1016/j.protcy.2013.12.287Google ScholarCross Ref
Jean-Marc Valin. 2018. A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement. 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). DOI:https://doi.org/10.1109/mmsp.2018.8547084Google ScholarCross Ref
Westhausen, N. L., & Meyer, B. T. 2020. Dual-signal transformation LSTM network for real-time noise suppression. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2020-October, pp. 2477–2481). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2020-2631Google ScholarCross Ref
Yi Hu and Philipos C. Loizou. 2008. Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16, 229-238. DOI:https://doi.org/10.1109/TASL.2007.911054Google ScholarDigital Library
Cees H. Taal, Richard C. Hendriks, and Richard Heusdens. 2011. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech. IEEE Transactions on Audio, Speech, and Language Processing 19, 2125-2136. DOI:https://doi.org/10.1109/TASL.2011.2114881Google ScholarDigital Library
Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, and John R. Hershey. 2019. SDR – Half-baked or Well Done? ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 19, 626-630. DOI: https://doi.org/doi: 10.1109/ICASSP.2019.8683855Google ScholarCross Ref
Robert, J., Webbie, M. & others, 2018. Pydub, GitHub. Available at: http://pydub.com/.Google Scholar

Index Terms

Build A Module for Improvement Real Time Speech enhancement using Long Short-term Memory Approach: Improvement Real Time Speech enhancement using Long Short-term Memory

Index terms have been assigned to the content through auto-classification.

Recommendations

Automatic Pitch Accent Detection Using Long Short-Term Memory Neural Networks
SSPS '19: Proceedings of the 2019 International Symposium on Signal Processing Systems

Prosody detection is gaining increasingly popularity in the domain of prosody research because of its significance in Text to Sound, Computer-aided pronunciation training (CAPT), etc. Pitch accent is an important part of prosody and many recognition ...
Read More
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. ...
Read More
Long-term and short-term memory networks based on forgetting memristors
Abstract
The hardware circuit of neural network based on forgetting memristors not only has the characteristics of high computational efficiency and low power consumption, but also has the advantage that a memristor can store the weight of long-term memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIIT '23: Proceedings of the 2023 8th International Conference on Intelligent Information Technology
February 2023
310 pages
ISBN:9781450399616
DOI:10.1145/3591569

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 22
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Build A Module for Improvement Real Time Speech enhancement using Long Short-term Memory Approach: Improvement Real Time Speech enhancement using Long Short-term Memory

ICIIT '23: Proceedings of the 2023 8th International Conference on Intelligent Information Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic Pitch Accent Detection Using Long Short-Term Memory Neural Networks

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Long-term and short-term memory networks based on forgetting memristors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Build A Module for Improvement Real Time Speech enhancement using Long Short-term Memory Approach: Improvement Real Time Speech enhancement using Long Short-term Memory

ICIIT '23: Proceedings of the 2023 8th International Conference on Intelligent Information Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic Pitch Accent Detection Using Long Short-Term Memory Neural Networks

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Long-term and short-term memory networks based on forgetting memristors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media