Abstract
Post-traumatic stress disorder (PTSD) is a traumatic-stressor-related disorder developed by exposure to a traumatic or adverse environmental event that caused serious harm or injury. Structured interview is the only widely accepted clinical practice for PTSD diagnosis but suffers from several limitations including the stigma associated with the disease. Diagnosis of PTSD patients by analyzing speech signals has been investigated as an alternative since recent years, where speech signals are processed to extract frequency features and these features are then fed into a classification model for PTSD diagnosis. In this paper, we developed a deep belief network (DBN) model combined with a transfer learning (TL) strategy for PTSD diagnosis. We computed three categories of speech features and utilized the DBN model to fuse these features. The TL strategy was utilized to transfer knowledge learned from a large speech recognition database, TIMIT, for PTSD detection where PTSD patient data are difficult to collect. We evaluated the proposed methods on two PTSD speech databases, each of which consists of audio recordings from 26 patients. We compared the proposed methods with other popular methods and showed that the state-of-the-art support vector machine (SVM) classifier only achieved an accuracy of 57.68%, and TL strategy boosted the performance of the DBN from 61.53 to 74.99%. Altogether, our method provides a pragmatic and promising tool for PTSD diagnosis. Preliminary results of this study were presented in Banerjee (in: 2017 IEEE international conference on data mining (ICDM), IEEE, 2017).








Similar content being viewed by others
References
Banerjee D, Islam K, Mei G, Xiao L, Zhang G, Xu R, Ji S, Li J (2017) A deep transfer learning approach for improved post-traumatic stress disorder diagnosis. In: 2017 IEEE international conference on data mining (ICDM), IEEE, pp 11–20
Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127
Bijleveld H-A (2015) Post-traumatic stress disorder and stuttering: a diagnostic challenge in a case study. Proc Soc Behav Sci 193:37–43
Brown SM, Webb A, Mangoubi R, Dy JG (2015) A sparse combined regression-classification formulation for learning a physiological alternative to clinical post-traumatic stress disorder scores. In: AAAI, pp 1700–1706
Calvo RA, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37
Deng L, Li J, Huang J-T, Yao K, Yu D, Seide F, Seltzer M, Zweig G, He X, Williams J, et al (2013) Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 8604–8608
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6964–6968
Edwards AL (1948) Note on the correction for continuity in testing the significance of the difference between correlated proportions. Psychometrika 13(3):185–187
Farrús M, Hernando J, Ejarque P (2007) Jitter and shimmer measurements for speaker recognition. In: Eighth annual conference of the international speech communication association
Foa EB, Steketee G, Rothbaum BO (1989) Behavioral/cognitive conceptualizations of post-traumatic stress disorder. Behav Ther 20(2):155–176
Friedman MJ (2007) PTSD history and overview. United States Department of Veterans Affairs
Galatzer-Levy IR, Ma S, Statnikov A, Yehuda R, Shalev AY (2017) Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting ptsd. Transl Psychiatr 7(3):e1070
Galatzer-Levy IR, Karstoft KI, Statnikov A, Shalev AY (2014) Quantitative forecasting of ptsd from early trauma responses: a machine learning application. J Psychiatr Res 59:68–76
Garofolo John S, Lamel Lori F, Fisher William M, Fiscus Jonathan G, Pallett David S, Dahlgren Nancy L, Victor Z (1993) TIMIT acoustic-phonetic continuous speech corpus, 1993. Linguistic Data Consortium, Philadelphia
Grinage BD (2003) Diagnosis and management of post-traumatic stress disorder. Am Fam Phys 68(12):2401–2408
Gulzar T, Singh A, Sharma S (2014) Comparative analysis of IPCC, MFCC and BFCC for the recognition of Hindi words using artificial neural networks. Int J Comput Appl 101(12):22–27
How common is ptsd (2018) https://www.ptsd.va.gov/public/ptsd-overview/basics/how-common-is-ptsd.asp. Accessed 20 June 2018
Hansen JHL, Kim W, Rahurkar M, Ruzanski E, Meyerhoff J (2011) Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J Adv Signal Process 2011(1):906789
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hovens JE, Van der Ploeg HM, Klaarenbeek MTA, Bramsen I, Schreuder JN, Rivero VV (1994) The assessment of posttraumatic stress disorder: with the clinician administered ptsd scale: Dutch results. J Clin Psychol 50(3):325–340
Kamishima T, Hamasaki M, Akaho S (2009) Trbagg: a simple transfer learning method and its application to personalization in collaborative tagging. In: Ninth IEEE international conference on data mining, 2009, ICDM’09, IEEE, pp 219–228
Karen-Inge K, Galatzer-Levy Isaac R, Alexander S, Zhiguo L, Shalev Arieh Y (2015) Bridging a translational gap: using machine learning to improve the prediction of ptsd. BMC Psychiatr 15(1):30
Kessler RC, Rose S, Koenen KC, Karam EG, Stang PE, Stein DJ, Heeringa SG, Hill ED, Liberzon I, McLaughlin KA (2014) How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the who world mental health surveys. World Psychiatr 13(3):265–274
Kim J-H, Woodland PC (2001) The use of prosody in a combined system for punctuation generation and speech recognition. In: Seventh European conference on speech communication and technology
Knoth B, Vergyri D, Shriberg E, Mitra V, Mclaren V, Kathol A, Richey C, Graciarena M (2018) Systems for speech-based assessment of a patient’s state-of-mind. US Patent WO2016028495 A1
Krothapalli SR, Koolagudi SG (2013) Characterization and recognition of emotions from speech using excitation source information. Int J Speech Technol 16(2):181–201
Kumaraswamy R, Odom P, Kersting K, Leake D, Natarajan S (2015) Transfer learning via relational type matching. In: 2015 IEEE international conference on data mining (ICDM), IEEE, pp 811–816
Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, Stober S (2017) Transfer learning for speech recognition on a budget. ArXiv preprint arXiv:1706.00290
Li X, Tao J, Johnson MT, Soltis J, Savage A, Leong KM, Newman JD (2007) Stress and emotion classification using jitter and shimmer features. In: IEEE international conference on acoustics, speech and signal processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV–1081
Litman DJ, Hirschberg JB, Swerts M (2000) Predicting automatic speech recognition performance using prosodic cues. In: Proceedings of the 1st North American chapter of the association for computational linguistics conference. Association for Computational Linguistics, pp 218–225
Marinić I, Supek F, Kovačić Z, Rukavina L, Jendričko T, Kozarić-Kovačić D (2007) Posttraumatic stress disorder: diagnostic data analysis by data mining methodology. Croat Med J 48(2):185–197
Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. ArXiv preprint arXiv:1003.4083
Omurca S, Ekinci E (2015) An alternative evaluation of post traumatic stress disorder with machine learning methods. In: 2015 International symposium on innovations in intelligent systems and applications (INISTA), IEEE, pp 1–7
Ooi KEBrian, Low LSA, Lech M, Allen N (2012) Early prediction of major depression in adolescents using glottal wave characteristics and Teager energy parameters. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4613–4616
Ptsd and dsm-5 (2016) http://www.ptsd.va.gov/professional/PTSD-overview/dsm_criteria_ptsd.asp. Accessed 10 July 2016
Ptsd and symptoms (2018) https://www.ptsd.va.gov/public/ptsd-overview/basics/symptoms_of_ptsd.asp. Accessed 20 June 2018
Pan SJ, Yang Q (2010) A survey on transfer learning. EEE Trans Knowl Data Eng 22(10):1345–1359
Pitman RK (1989) Post-traumatic stress disorder, hormones, and memory. Biol Psychiatr 26(3):221–223
Pratt LY (1993) Discriminability-based transfer between neural networks. In: Advances in neural information processing systems, pp 204–211
Ramaswamy S, Madaan V, Qadri F, Heaney CJ, North TC, Padala PR, Sattar SP, Petty F (2005) A primary care perspective of posttraumatic stress disorder for the department of veterans affairs. Prim Care Compan J Clin Psychiatr 7(4):180
Rozgic V, Vazquez-Reina A, Crystal M, Srivastava A, Tan V, Berka C (2014) Multi-modal prediction of ptsd and stress indicators. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3636–3640
Scherer S, Lucas GM, Gratch J, Rizzo AS, Morency L-P (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73
Scherer S, Stratou G, Gratch J, Morency L-P (2013) Investigating voice quality as a speaker-independent indicator of depression and ptsd. In: Interspeech, pp 847–851
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Sparr LF, Bremner JD (2005) Post-traumatic stress disorder and memory prescient medicolegal testimony at the international war crimes tribunal? J Am Acad Psychiatr Law Online 33(1):71–78
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
van den Broek EL, van der Sluis F, Dijkstra T (2010) Telling the story and re-living the past: how speech analysis can reveal emotions in post-traumatic stress disorder (ptsd) patients. In: Sensing emotions, Springer, pp 153–180
Vergyri D, Knoth B, Shriberg E, Mitra V, McLaren M, Ferrer L, Garcia P, Marmar C (2015) Speech-based assessment of ptsd in a military population using diverse feature classes. In: Sixteenth annual conference of the international speech communication association
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Young A (1997) The harmony of illusions: inventing post-traumatic stress disorder. Princeton University Press, Princeton
Zhang Q, Wu Q, Zhu H, He L, Huang H, Zhang J, Zhang W (2016) Multimodal MRI-based classification of trauma survivors with and without post-traumatic stress disorder. Front Neurosci 10:292
Zhang W, Li R, Zeng T, Sun Q, Kumar S, Ye J, Ji S (2016) Deep model based transfer and multi-task learning for biological image analysis. In: IEEE transactions on big data
Zhuang X, Rozgić V, Crystal M, Marx BP (2014) Improving speech-based ptsd detection via multi-view learning. In: Spoken language technology workshop (SLT), 2014 IEEE, pp 260–265
Acknowledgements
This research is partially supported by DOD under grant W81XWH-15-C-0099. The authors would like to thank UHCMC for providing the Ohio dataset. The support of NVIDIA Corporation for the donation of the TESLA K40 GPU used in this research is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
1.1 Youtube dataset
Following are Youtube links to the 26 subjects in “Youtube Data” utilized in our study (the last 8 links were not accessible as of November 17, 2018):
- (1)
- (2)
-
(3)
http://www.youtube.com/user/veteransPTSD#p/a/76FE97306FBC00C3/1/vPPiFrwCrSI
- (4)
- (5)
- (6)
- (7)
- (8)
- (9)
- (10)
- (11)
-
(12)
http://www.youtube.com/user/millerusaf#p/search/6/27dEAmlyDVw
- (13)
-
(14)
http://www.youtube.com/user/deathcoreairsoft#p/search/11/jCZpcvRZvQk
-
(15)
http://www.youtube.com/user/spartan765#p/search/2/82YT5kHmwo8
-
(16)
http://www.youtube.com/user/VulcanMarine#p/search/0/WIrXu4q4hCU
- (17)
- (18)
- (19)
- (20)
-
(21)
http://video.google.com/videoplay?docid=6545321893646982640#
-
(22)
http://video.google.com/videoplay?docid=6545321893646982640#docid365676523829472362
-
(23)
http://www.youtube.com/user/TheAirsoftReviewer1#p/search/0/aWxNfIZQj84
- (24)
- (25)
- (26)
Rights and permissions
About this article
Cite this article
Banerjee, D., Islam, K., Xue, K. et al. A deep transfer learning approach for improved post-traumatic stress disorder diagnosis. Knowl Inf Syst 60, 1693–1724 (2019). https://doi.org/10.1007/s10115-019-01337-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01337-2