Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks

Li, Lishuang; Wan, Jia; Huang, Degen

doi:10.1007/978-3-319-73618-1_46

Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks

Lishuang Li¹⁸,
Jia Wan¹⁸ &
Degen Huang¹⁸

Conference paper
First Online: 05 January 2018

3271 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

Most word embedding methods are proposed with general purpose which take a word as a basic unit and learn embeddings by words’ external contexts. However, in the field of biomedical text mining, there are many biomedical entities and syntactic chunks which can enrich the semantic meaning of word embeddings. Furthermore, large scale background texts for training word embeddings are not available in some scenarios. Therefore, we propose a novel biomedical domain-specific word embeddings model based on maximum-margin (BEMM) to train word embeddings using small set of background texts, which incorporates biomedical domain information. Experimental results show that our word embeddings overall outperform other general-purpose word embeddings on some biomedical text mining tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://2016.bionlp-st.org/tasks/bb2.

References

Deléger, L., Bossy, R., Chaix, E., Ba, M., Ferré, A., Bessières, P.: Overview of the bacteria biotope task at BioNLP shared task 2016. In: Bionlp Shared Task Workshop - Association for Computational Linguistics, pp. 12–22 (2016)
Google Scholar
Chaix, E., Dubreucq, B., Fatihi, A., Valsamou, D., Bossy, R., Ba, M.: Overview of the regulatory network of plant seed development (SeeDev) task at the BioNLP shared task 2016. In: Bionlp Shared Task Workshop - Association for Computational Linguistics, pp. 1–11 (2017)
Google Scholar
Bengio, Y., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(6), 1137–1155 (2003)
MATH Google Scholar
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1025–1035 (2014)
Google Scholar
Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2195–2201 (2015)
Google Scholar
Collobert, R., Weston, J., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(1), 2493–2537 (2011)
MATH Google Scholar
Socher, R., Lin, C.Y., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: International Conference on Machine Learning, ICML 2011, pp. 129–136 (2011)
Google Scholar
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)
Google Scholar
Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 2014(2), 1–6 (2014)
Google Scholar
Li, C., Rao, Z., Zhang, X.: LitWay, discriminative extraction for different bio-events. In: Bionlp Shared Task Workshop, pp. 32–41 (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013a)
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)
Google Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Meeting of the Association for Computational Linguistics, pp. 1555–1565 (2014)
Google Scholar
Jiang, Z., Li, L., Huang, D., Jin, L.: Training word embeddings for deep learning in biomedical text mining tasks. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 625–628 (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Li, L., Jiang, Z., Liu, Y., Huang, D.: Word representation on small background texts. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds.) SMP 2016. CCIS, vol. 669, pp. 143–150. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-2993-6_12
Chapter Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)
MathSciNet MATH Google Scholar
Mehryary, F., Björne, J., Pyysalo, S., Salakoski, T., Ginter, F.: Deep learning with minimal training data: TurkuNLP entry in the BioNLP shared task 2016. In: Bionlp Shared Task Workshop, pp. 73–81 (2016)
Google Scholar
Li, L., Qin, M., Huang, D.: Biomedical event trigger detection based on hybrid methods integrating word embeddings. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds.) CCKS 2016. CCIS, vol. 650, pp. 67–79. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-3168-7_7
Chapter Google Scholar
Hinton, G.E., McClelland, J., Rumelhart, D.E.: Distributed representations. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 1, 77–109 (1986)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 323(6088), 533–536 (1986)
MATH Google Scholar
Sagae, K., Tsujii, J.I.: Dependency parsing and domain adaptation with LR models and parser ensembles. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1044–1050 (2007)
Google Scholar
Li, L., Zheng, J., Wan, J., Huang, D., Lin, X.: Biomedical event extraction via long short term memory networks along dynamic extended tree. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 739–742 (2016)
Google Scholar
Pyysalo, S., Ohta, T., Miwa, M., Cho, H.C., Tsujii, J., Ananiadou, S.: Event extraction across multiple levels of biological organization. Bioinformatics 28(18), 575–581 (2012)
Article Google Scholar
Li, L., Liu, S., Qin, M., Wang, Y., Huang, D.: Extracting biomedical event with dual decomposition integrating word embeddings. Trans. Comput. Biol. Bioinform. 13, 669–677 (2015)
Article Google Scholar

Download references

Acknowledgment

The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China under No. 61672126.

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, China
Lishuang Li, Jia Wan & Degen Huang

Authors

Lishuang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jia Wan
View author publications
You can also search for this author in PubMed Google Scholar
Degen Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lishuang Li .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xuanjing Huang
Singapore Management University, Singapore, Singapore
Jing Jiang
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Wan, J., Huang, D. (2018). Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-73618-1_46
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics