ABSTRACT
There is an ever-increasing amount of audio content being created and shared online. With so much data available, it can be challenging to make sense of it all and extract useful information. The utilization of audio analytics tools can assist organizations in comprehending the vast quantities of audio data at their disposal, thereby acquiring valuable insights into customer behavior, market trends, and business operations. Therefore, this paper presents an Audio Analytics Tool for the healthcare industry to recognize patients’ mental health.
- Bocklet, T. 2008. Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. (2008), 1605–1608. DOI:https://doi.org/10.1109/ICASSP.2008.4517932.Google ScholarCross Ref
- Campbell, W.M. 2006. Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters. 13, 5 (May 2006), 308–311. DOI:https://doi.org/10.1109/LSP.2006.870086.Google ScholarCross Ref
- Fine, S. 2001. A hybrid GMM/SVM approach to speaker identification. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221). 1, (2001), 417–420. DOI:https://doi.org/10.1109/ICASSP.2001.940856.Google ScholarCross Ref
- Graves, A. 2013. Hybrid speech recognition with Deep Bidirectional LSTM. 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings. (2013), 273–278. DOI:https://doi.org/10.1109/ASRU.2013.6707742.Google ScholarCross Ref
- Hochreiter, S. and Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation. 9, 8 (Nov. 1997), 1735–1780. DOI:https://doi.org/10.1162/NECO.1997.9.8.1735.Google ScholarDigital Library
- Kim, S. 2016. Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. (Sep. 2016), 4835–4839. DOI:https://doi.org/10.1109/ICASSP.2017.7953075.Google ScholarDigital Library
- Lee, J. and Watanabe, S. 2021. Intermediate loss regularization for CTC-based speech recognition. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2021-June, (2021), 6224–6228. DOI:https://doi.org/10.1109/ICASSP39728.2021.9414594.Google ScholarCross Ref
- Likitha, M.S. 2017. Speech based human emotion recognition using MFCC. 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). 2018-January, (Feb. 2017), 2257–2260. DOI:https://doi.org/10.1109/WISPNET.2017.8300161.Google ScholarCross Ref
- Livieris, I.E. 2019. Gender Recognition by Voice Using an Improved Self-Labeled Algorithm. Machine Learning and Knowledge Extraction 2019, Vol. 1, Pages 492-503. 1, 1 (Mar. 2019), 492–503. DOI:https://doi.org/10.3390/MAKE1010030.Google ScholarCross Ref
- Nakagawa, S. 2012. Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech and Language Processing. 20, 4 (May 2012), 1085–1095. DOI:https://doi.org/10.1109/TASL.2011.2172422.Google ScholarDigital Library
- Oruh, J. 2022. Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition. IEEE Access. 10, (2022), 30069–30079. DOI:https://doi.org/10.1109/ACCESS.2022.3159339.Google ScholarCross Ref
- Palo, H.K. 2015. Use of different features for emotion recognition using MLP network. Advances in Intelligent Systems and Computing. 332, (2015), 7–15. DOI:https://doi.org/10.1007/978-81-322-2196-8_2.Google ScholarCross Ref
- Reynolds, D.A. and Rose, R.C. 1995. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing. 3, 1 (1995), 72–83. DOI:https://doi.org/10.1109/89.365379.Google ScholarCross Ref
- Sinith, M.S. 2010. A novel method for text-independent speaker identification using MFCC and GMM. ICALIP 2010 - 2010 International Conference on Audio, Language and Image Processing, Proceedings. (2010), 292–296. DOI:https://doi.org/10.1109/ICALIP.2010.5684389.Google ScholarCross Ref
- Staudemeyer, R.C. and Morris, E.R. 2019. Understanding LSTM – a tutorial into Long Short-Term Memory Recurrent Neural Networks. (Sep. 2019).Google Scholar
- Yucesoy, E. and Nabiyev, V. V. 2013. Gender identification of a speaker using MFCC and GMM. ELECO 2013 - 8th International Conference on Electrical and Electronics Engineering. (2013), 626–629. DOI:https://doi.org/10.1109/ELECO.2013.6713922.Google ScholarCross Ref
Index Terms
- Audio Examination – Mental Health Diagnosis in Healthcare through Audio Analytics
Recommendations
Health diagnosis robot based on healthcare big data and fuzzy matching
Applied Mathematics Related to Nonlinear ProblemsThis paper discussed a healthcare big data analysis system and device for health diagnosis robot. The system is composed of the health examination data acquisition equipment and a healthcare big data server. The health examination data acquisition ...
Data Analytics in Mental Healthcare
Worldwide, about 700 million people are estimated to suffer from mental illnesses. In recent years, due to the extensive growth rate in mental disorders, it is essential to better understand the inadequate outcomes from mental health problems. Mental ...
Comments