ABSTRACT
An essential customer experience is required for all businesses today, and customer support as a service brings the right people and processes together. When designing a system for in the context of audio communication for transmission purposes, noise influences must be carefully considered. Improving the quality of phone calls for a smart virtual call center is essential for more effective customer care. This paper proposed a module for improving real-time speech enhancement of phone calls using Long short-term memory (LSTM), an artificial neural network used in the fields of artificial intelligence and deep learning. LSTMs are designed to revoke the long-term dependency issue, remembering information for long periods is generally their default way of behaving. The data set using for this approach is both in English and Vietnamese, the results also improve with evaluation metrics such as PESQ, SI-SDR, STOI.
- Kumar V. and Werner R. 2018. Customer Relationship Management. Springer.Google Scholar
- Gillian M. Davis. 2002. Noise Reduction in Speech Applications. CRC Press.Google Scholar
- Lim J.S. and Oppenheim A.V. 1979. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE 67, 1586-1979. DOI:https://doi.org/10.1109/PROC.Google ScholarCross Ref
- Welch, G. & Bishop, G. 1995. An Introduction to the Kalman Filter. Technical report, University of North Carolina at Chapel Hill , University of North Carolina at Chapel Hill , Chapel Hill, NC, USA .Google Scholar
- Oswald Campesato. 2020. Chapter 4, 5. Artificial Intelligence, Machine Learning, and Deep Learning. Mercury Learning and Information.Google Scholar
- Wang, DeLiang and Chen, Jitong. 2017. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 1702-1726. DOI:https://doi.org/10.1109/TASLP.2018.2842159Google ScholarDigital Library
- Prajna Kunche and N. Manikanthababu. 2020. Fractional Fourier Transform Techniques for Speech Enhancement. Springer Nature, 22-50.Google Scholar
- Firdauzi, Anugerah, Wirianto, Kiki, Arijal, Muhammad, and Adiono, Trio. 2013. Design and Implementation of Real Time Noise Cancellation System based on Spectral Subtraction Method. Procedia Technology 11, 100-1010. DOI:https://doi.org/10.1016/j.protcy.2013.12.287Google ScholarCross Ref
- Jean-Marc Valin. 2018. A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement. 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP). DOI:https://doi.org/10.1109/mmsp.2018.8547084Google ScholarCross Ref
- Westhausen, N. L., & Meyer, B. T. 2020. Dual-signal transformation LSTM network for real-time noise suppression. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2020-October, pp. 2477–2481). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2020-2631Google ScholarCross Ref
- Yi Hu and Philipos C. Loizou. 2008. Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16, 229-238. DOI:https://doi.org/10.1109/TASL.2007.911054Google ScholarDigital Library
- Cees H. Taal, Richard C. Hendriks, and Richard Heusdens. 2011. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech. IEEE Transactions on Audio, Speech, and Language Processing 19, 2125-2136. DOI:https://doi.org/10.1109/TASL.2011.2114881Google ScholarDigital Library
- Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, and John R. Hershey. 2019. SDR – Half-baked or Well Done? ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 19, 626-630. DOI: https://doi.org/doi: 10.1109/ICASSP.2019.8683855Google ScholarCross Ref
- Robert, J., Webbie, M. & others, 2018. Pydub, GitHub. Available at: http://pydub.com/.Google Scholar
Index Terms
- Build A Module for Improvement Real Time Speech enhancement using Long Short-term Memory Approach: Improvement Real Time Speech enhancement using Long Short-term Memory
Recommendations
Automatic Pitch Accent Detection Using Long Short-Term Memory Neural Networks
SSPS '19: Proceedings of the 2019 International Symposium on Signal Processing SystemsProsody detection is gaining increasingly popularity in the domain of prosody research because of its significance in Text to Sound, Computer-aided pronunciation training (CAPT), etc. Pitch accent is an important part of prosody and many recognition ...
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. ...
Long-term and short-term memory networks based on forgetting memristors
AbstractThe hardware circuit of neural network based on forgetting memristors not only has the characteristics of high computational efficiency and low power consumption, but also has the advantage that a memristor can store the weight of long-term memory ...
Comments