Skip to main content

Automated Lexicon and Feature Construction Using Word Embedding and Clustering for Classification of ASD Diagnoses Using EHR

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Abstract

Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. CoRR, vol. abs/1301.3781 (2013)

    Google Scholar 

  2. Zheng, T., Xie, W., Xu, L., He, X., Zhang, Y., You, M., Yang, G., Chen, Y.: A machine learning-based framework to identify type 2 diabetes through electronic health records. Int. J. Med. Inform. 97, 120–127 (2017)

    Article  Google Scholar 

  3. Wang, J., Zhang, J., An, Y., Lin, H., Yang, Z., Zhang, Y., Sun, Y.: Biomedical event trigger detection by dependency-based word embedding. BMC Med. Genomics 9, 123–133 (2016)

    Article  Google Scholar 

  4. Fergadiotis, G., Gorman, K., Bedrick, S.: Algorithmic classification of five characteristic types of paraphasias. Am. J. Speech Lang. Pathol. 25, S776–S787 (2016)

    Article  Google Scholar 

  5. Minarro-Gimenez, J.A., Marín-Alonso, O., Samwal, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health. Technol. Inform. 205, 584–588 (2014)

    Google Scholar 

Download references

Acknowledgements

The EHR were collected by the Centers for Disease Control and Prevention Autism and Developmental Disabilities Monitoring (ADDM) Network. Pettygrove and Kurzius-Spencer received support from CDC Cooperative Agreement Number 5UR3/DD000680. The conclusions presented here are those of the authors and do not represent the official position of the Centers for Disease Control and Prevention.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gondy Leroy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Leroy, G., Gu, Y., Pettygrove, S., Kurzius-Spencer, M. (2017). Automated Lexicon and Feature Construction Using Word Embedding and Clustering for Classification of ASD Diagnoses Using EHR. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59569-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59568-9

  • Online ISBN: 978-3-319-59569-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics