Skip to main content

Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks

  • Conference paper
  • First Online:
  • 3271 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

Most word embedding methods are proposed with general purpose which take a word as a basic unit and learn embeddings by words’ external contexts. However, in the field of biomedical text mining, there are many biomedical entities and syntactic chunks which can enrich the semantic meaning of word embeddings. Furthermore, large scale background texts for training word embeddings are not available in some scenarios. Therefore, we propose a novel biomedical domain-specific word embeddings model based on maximum-margin (BEMM) to train word embeddings using small set of background texts, which incorporates biomedical domain information. Experimental results show that our word embeddings overall outperform other general-purpose word embeddings on some biomedical text mining tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://2016.bionlp-st.org/tasks/bb2.

References

  1. Deléger, L., Bossy, R., Chaix, E., Ba, M., Ferré, A., Bessières, P.: Overview of the bacteria biotope task at BioNLP shared task 2016. In: Bionlp Shared Task Workshop - Association for Computational Linguistics, pp. 12–22 (2016)

    Google Scholar 

  2. Chaix, E., Dubreucq, B., Fatihi, A., Valsamou, D., Bossy, R., Ba, M.: Overview of the regulatory network of plant seed development (SeeDev) task at the BioNLP shared task 2016. In: Bionlp Shared Task Workshop - Association for Computational Linguistics, pp. 1–11 (2017)

    Google Scholar 

  3. Bengio, Y., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(6), 1137–1155 (2003)

    MATH  Google Scholar 

  4. Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1025–1035 (2014)

    Google Scholar 

  5. Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2195–2201 (2015)

    Google Scholar 

  6. Collobert, R., Weston, J., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(1), 2493–2537 (2011)

    MATH  Google Scholar 

  7. Socher, R., Lin, C.Y., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: International Conference on Machine Learning, ICML 2011, pp. 129–136 (2011)

    Google Scholar 

  8. Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)

    Google Scholar 

  9. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 2014(2), 1–6 (2014)

    Google Scholar 

  10. Li, C., Rao, Z., Zhang, X.: LitWay, discriminative extraction for different bio-events. In: Bionlp Shared Task Workshop, pp. 32–41 (2016)

    Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013a)

  12. Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)

    Google Scholar 

  13. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Meeting of the Association for Computational Linguistics, pp. 1555–1565 (2014)

    Google Scholar 

  14. Jiang, Z., Li, L., Huang, D., Jin, L.: Training word embeddings for deep learning in biomedical text mining tasks. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 625–628 (2015)

    Google Scholar 

  15. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)

    Google Scholar 

  16. Li, L., Jiang, Z., Liu, Y., Huang, D.: Word representation on small background texts. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds.) SMP 2016. CCIS, vol. 669, pp. 143–150. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-2993-6_12

    Chapter  Google Scholar 

  17. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Mehryary, F., Björne, J., Pyysalo, S., Salakoski, T., Ginter, F.: Deep learning with minimal training data: TurkuNLP entry in the BioNLP shared task 2016. In: Bionlp Shared Task Workshop, pp. 73–81 (2016)

    Google Scholar 

  19. Li, L., Qin, M., Huang, D.: Biomedical event trigger detection based on hybrid methods integrating word embeddings. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds.) CCKS 2016. CCIS, vol. 650, pp. 67–79. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-3168-7_7

    Chapter  Google Scholar 

  20. Hinton, G.E., McClelland, J., Rumelhart, D.E.: Distributed representations. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 1, 77–109 (1986)

    Google Scholar 

  21. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 323(6088), 533–536 (1986)

    MATH  Google Scholar 

  22. Sagae, K., Tsujii, J.I.: Dependency parsing and domain adaptation with LR models and parser ensembles. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1044–1050 (2007)

    Google Scholar 

  23. Li, L., Zheng, J., Wan, J., Huang, D., Lin, X.: Biomedical event extraction via long short term memory networks along dynamic extended tree. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 739–742 (2016)

    Google Scholar 

  24. Pyysalo, S., Ohta, T., Miwa, M., Cho, H.C., Tsujii, J., Ananiadou, S.: Event extraction across multiple levels of biological organization. Bioinformatics 28(18), 575–581 (2012)

    Article  Google Scholar 

  25. Li, L., Liu, S., Qin, M., Wang, Y., Huang, D.: Extracting biomedical event with dual decomposition integrating word embeddings. Trans. Comput. Biol. Bioinform. 13, 669–677 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China under No. 61672126.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lishuang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, L., Wan, J., Huang, D. (2018). Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73618-1_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73617-4

  • Online ISBN: 978-3-319-73618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics