Abstract
Verbal Autopsy (VA) is the instrument used to collect Causes of Death (CoD) in places in which the access to health services is out of reach. It consists of a questionnaire addressed to the caregiver of the deceased and involves closed questions (CQ) about signs and symptoms prior to the decease. There is a global effort to reduce the number of questions in the questionnaire to the minimum essential information to ascertain a CoD. To this end we took two courses of action. On the one hand, the relation of the responses with respect to the CoD was considered by means of the entropy in a supervised feature subset selection (FSS) approach. On the other hand, we inspected the questions themselves by means of semantic similarity leading to an unsupervised approach based on semantic similarity (SFSS). In an attempt to assess, quantitatively, the impact of reducing the questionnaire, we assessed the use of these FSS approaches on the CoD predictive capability of a classifier. Experimental results showed that unsupervised semantic similarity feature subset selection (SFSS) approach was competitive to identify similar questions. Nevertheless, naturally, supervised FSS based on the entropy of the responses performed better for CoD prediction. To sum up, the necessity of reviewing the VA questionnaire was accompanied with quantitative evidence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Tariff, Insilico and InterVA are available through: https://www.who.int/healthinfo/statistics/verbalautopsystandards/en/.
References
Aleksandrowicz, L., et al.: Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian million death study. BMC Med. 12(1), 21 (2014)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Byass, P.: Uncounted causes of death. Lancet 387(10013), 26–27 (2016)
Byass, P., et al.: Strengthening standardised interpretation of verbal autopsy data: the new interva-4 tool. Glob. Health Action 5(1), 19281 (2012)
Cao, J., Kwong, S., Wang, R., Li, X., Li, K., Kong, X.: Class-specific soft voting based multiple extreme learning machines ensemble. Neurocomputing 149, 275–284 (2015)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Clark, S.J., Li, Z., McCormick, T.H.: Quantifying the contributions of training data and algorithm logic to the performance of automated cause-assignment algorithms for verbal autopsy (2018)
Cohen, K.B., Demner-Fushman, D.: Biomedical natural language processing, vol. 11. John Benjamins Publishing Company (2014)
Dalianis, H.: Clinical Text Mining: Secondary Use of Electronic Patient Records. Springer Nature (2018). https://doi.org/10.1007/978-3-319-78503-5
D’Ambruoso, L., et al.: The case for verbal autopsy in health systems strengthening. Lancet Glob. Health 5(1), e20–e21 (2017)
Flaxman, A.D., Joseph, J.C., Murray, C.J., Riley, I.D., Lopez, A.D.: Performance of InSilicoVA for assigning causes of death to verbal autopsies: multisite validation study using clinical diagnostic gold standards. BMC Med. 16(1), 56 (2018)
Ganapathy, S., Yi, K., Omar, M., Anuar, M., Jeevananthan, C., Rao, C.: Validation of verbal autopsy: determination of cause of deaths in Malaysia 2013. BMC Public Health 17(1), 653 (2017)
James, S.L., Flaxman, A.D., Murray, C.J.: Performance of the tariff method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul. Health Metrics 9(1), 31 (2011)
Li, Z., McCormick, T., Clark, S.: Replicate Tariff Method for Verbal Autopsy Version. R Foundation for Statistical Computing, Vienna (2016)
Lo, S., Horton, R.: Everyone counts-so count everyone. Lancet 386(10001), 1313–1314 (2015)
Lopez, A.D., AbouZahr, C., Shibuya, K., Gollogly, L.: Keeping count: births, deaths, and causes of death. Lancet 370(9601), 1744–1746 (2007)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
McCormick, T.H., Li, Z.R., Calvert, C., Crampin, A.C., Kahn, K., Clark, S.J.: Probabilistic cause-of-death assignment using verbal autopsies. J. Am. Stat. Assoc. 111(515), 1036–1049 (2016)
Murray, C.J., et al.: Population health metrics research consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets. Popul. Health Metrics 9(1), 27 (2011)
Murtaza, S.S., Kolpak, P., Bener, A., Jha, P.: Automated verbal autopsy classification: using one-against-all ensemble method and Naïve Bayes classifier. Gates Open Res. 2, 63 (2018)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Rosário, E.V.N., et al.: Main causes of death in Dande, Angola: results from verbal autopsies of deaths occurring during 2009–2012. BMC Public Health 16(1), 719 (2016)
TensorFlow: Visualizing data using the embedding projector in tensorboard (2021)
Thomas, J., Li, Z., McCortsnemick, T., Clark, S., Byass, P.: Package interVA5. R foundation for statistical computing, Vienna, Austria (2018). https://CRAN.R-project.org/package=InterVA5
Thomas, L.M., D’Ambruoso, L., Balabanova, D.: Verbal autopsy in health policy and systems: a literature review. BMJ Glob. Health 3(2), e000639 (2018)
Tran, H.T., Nguyen, H.P., Walker, S.M., Hill, P.S., Rao, C.: Validation of verbal autopsy methods using hospital medical records: a case study in Vietnam. BMC Med. Res. Methodol. 18(1), 43 (2018)
Westly, E.: One million deaths. Nature 504(7478), 22 (2013)
World Health Organization: The 2016 WHO verbal autopsy instrument. https://www.who.int/healthinfo/statistics/verbalautopsystandards/en/. (2016)
World Health Organization, et al.: The World Health report: 2005: make every mother and child count. Technical report, Geneva: World Health Organization (2005)
Yan, Z., Jeblee, S., Hirst, G.: Can character embeddings improve cause-of-death classification for verbal autopsy narratives? In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 234–239. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/W19-5025. https://www.aclweb.org/anthology/W19-5025
Yu, H.F., Huang, F.L., Lin, C.J.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011)
Acknowledgements
This work was partially funded by the Spanish Ministry of Science and Innovation (DOTT-HEALTH/PAT-MED PID2019-106942RB-C31), European Commission (FEDER) and the Basque Government (IXA IT-1343-19).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cejudo, A., Trigueros, O., Pérez, A., Casillas, A., Cobos, D. (2021). Verbal Autopsy: First Steps Towards Questionnaire Reduction. In: Ekštein, K., Pártl, F., Konopík, M. (eds) Text, Speech, and Dialogue. TSD 2021. Lecture Notes in Computer Science(), vol 12848. Springer, Cham. https://doi.org/10.1007/978-3-030-83527-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-83527-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83526-2
Online ISBN: 978-3-030-83527-9
eBook Packages: Computer ScienceComputer Science (R0)