Abstract
Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. CoRR, vol. abs/1301.3781 (2013)
Zheng, T., Xie, W., Xu, L., He, X., Zhang, Y., You, M., Yang, G., Chen, Y.: A machine learning-based framework to identify type 2 diabetes through electronic health records. Int. J. Med. Inform. 97, 120–127 (2017)
Wang, J., Zhang, J., An, Y., Lin, H., Yang, Z., Zhang, Y., Sun, Y.: Biomedical event trigger detection by dependency-based word embedding. BMC Med. Genomics 9, 123–133 (2016)
Fergadiotis, G., Gorman, K., Bedrick, S.: Algorithmic classification of five characteristic types of paraphasias. Am. J. Speech Lang. Pathol. 25, S776–S787 (2016)
Minarro-Gimenez, J.A., MarÃn-Alonso, O., Samwal, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health. Technol. Inform. 205, 584–588 (2014)
Acknowledgements
The EHR were collected by the Centers for Disease Control and Prevention Autism and Developmental Disabilities Monitoring (ADDM) Network. Pettygrove and Kurzius-Spencer received support from CDC Cooperative Agreement Number 5UR3/DD000680. The conclusions presented here are those of the authors and do not represent the official position of the Centers for Disease Control and Prevention.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Leroy, G., Gu, Y., Pettygrove, S., Kurzius-Spencer, M. (2017). Automated Lexicon and Feature Construction Using Word Embedding and Clustering for Classification of ASD Diagnoses Using EHR. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-59569-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)