Skip to main content

Constraining Word Embeddings by Prior Knowledge – Application to Medical Information Retrieval

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Included in the following conference series:

Abstract

Word embedding has been used in many NLP tasks and showed some capability to capture semantic features. It has also been used in several recent studies in IR. However, word embeddings trained in unsupervised manner may fail to capture some of the semantic relations in a specific area (e.g. healthcare). In this paper, we leverage the existing knowledge (word relations) in the medical domain to constrain word embeddings using the principle that related words should have similar embeddings. The resulting constrained word embeddings are used to rerank documents, showing superior effectiveness to unsupervised word embeddings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of AMIA Symposium, pp. 17–21 (2001)

    Google Scholar 

  2. Babashzadeh, A., Huang, J., Daoud, M.: Exploiting semantics for improving clinical information retrieval. In: SIGIR (2013)

    Google Scholar 

  3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)

    Article  Google Scholar 

  4. Bian, J., Gao, B., Liu, T-Y.: Knowledge-powered deep learning for word embedding. ECML-PKDD, pp. 132–148 (2014)

    Google Scholar 

  5. De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., Bruza, P.: Medical semantic similarity with a neural language model. In: CIKM (2014)

    Google Scholar 

  6. Dinu, G., Baroni, M.: How to make words with vectors: phrase generation in distributional semantics. In: Proceedings of ACL, pp. 624–633

    Google Scholar 

  7. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: NAACL (2015)

    Google Scholar 

  8. Ganguly, D., Roy, D., Mitra, M., Jones, J.F.: A word embedding based generalized language model for information retrieval. In: SIGIR, pp. 795–798 (2015)

    Google Scholar 

  9. Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Jones, G.J.F.: ShARe/CLEF eHealth evaluation lab 2014, task 3: user-centred health information retrieval. In: CLEF 2014 Online Working Note, pp. 43–61 (2014)

    Google Scholar 

  10. Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp. 2333–2338 (2013)

    Google Scholar 

  11. Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: SIGIR, pp. 192–201 (1994)

    Google Scholar 

  12. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: Information retrieval as semantic inference: a graph inference model applied to medical search. Inf. Ret. 19(1), 6–37 (2016)

    Article  Google Scholar 

  13. Limsopatham, N., Macdonald, G., Ounis, I.: Inferring conceptual relationships to improve medical records search. In: Proceedings of Conference on Open Research Areas in IR, pp. 1–8 (2015)

    Google Scholar 

  14. Martinez, D., Otegi, A., Soroa, A., Agirre, E.: Improving search over electronic health records using UMLS-based query expansion through random walks. J. Biomed. Inf. 51, 100–106 (2014)

    Article  Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)

    Google Scholar 

  16. Mitra, B.: Exploring session context using distributed representations of queries and reformulations. In: SIGIR, pp. 3–12 (2015)

    Google Scholar 

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  18. Palotti, J., Zuccon, G., Goeuriot, L., Kelly, L., Hanbury, A., Jones, G.J.F., Lupu, M., Pecina, P.: CLEF eHealth evaluation lab 2015, task 2: retrieving information about medical symptoms. In: CLEF 2015 Online Working Notes, pp. 32–55 (2015)

    Google Scholar 

  19. Socher, R., Manning, C.D., Ng, A.Y.: Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Deep Learning and Unsupervised Feature Learning Workshop – NIPS (2010)

    Google Scholar 

  20. Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Simonsen, J.G., Nie, J.-Y.: A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: CIKM (2015)

    Google Scholar 

  21. Shen, W., Nie, J.-Y., Liu, X.-J.: An investigation of the effectiveness of concept-based approach in medical information retrieval GRIUM@CLEF2014eHealthTask3. User-centred health information retrieval. In: Proceedings of CLEF 2014 (2014)

    Google Scholar 

  22. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, pp. 101–110 (2014)

    Google Scholar 

  23. Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: SIGIR, pp. 373–382 (2015)

    Google Scholar 

  24. Vulic, I., Moens, M.-F.: Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: SIGIR, pp. 363–372 (2015)

    Google Scholar 

  25. Wang, Y., Liu, X., Fang, H.: A study of concept-based weighting regularization for medical records search. In: ACL (2014)

    Google Scholar 

  26. Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., Liu, T.-Y.: RC-NET: a general framework for incorporating knowledge into word representations. In: CIKM (2014)

    Google Scholar 

  27. Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: ACL, pp. 545–555 (2014)

    Google Scholar 

  28. Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013)

  29. Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: SIGIR (2015)

    Google Scholar 

  30. Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L.: Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of Australasian Document Computing Symposium (2015)

    Google Scholar 

  31. Zuccon, G., Koopman, B., Nguyen, A., Vickers, D., Butt, L.: Exploiting medical hierarchies for concept-based information retrieval. In: Proceedings of Australasian Document Computing Symposium (2012)

    Google Scholar 

Download references

Acknowledgement

This work is partly supported by an NSERC Discovery research grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian-Yun Nie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Liu, X., Nie, JY., Sordoni, A. (2016). Constraining Word Embeddings by Prior Knowledge – Application to Medical Information Retrieval. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48051-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48050-3

  • Online ISBN: 978-3-319-48051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics