Abstract
Most word embedding methods are proposed with general purpose which take a word as a basic unit and learn embeddings by words’ external contexts. However, in the field of biomedical text mining, there are many biomedical entities and syntactic chunks which can enrich the semantic meaning of word embeddings. Furthermore, large scale background texts for training word embeddings are not available in some scenarios. Therefore, we propose a novel biomedical domain-specific word embeddings model based on maximum-margin (BEMM) to train word embeddings using small set of background texts, which incorporates biomedical domain information. Experimental results show that our word embeddings overall outperform other general-purpose word embeddings on some biomedical text mining tasks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Deléger, L., Bossy, R., Chaix, E., Ba, M., Ferré, A., Bessières, P.: Overview of the bacteria biotope task at BioNLP shared task 2016. In: Bionlp Shared Task Workshop - Association for Computational Linguistics, pp. 12–22 (2016)
Chaix, E., Dubreucq, B., Fatihi, A., Valsamou, D., Bossy, R., Ba, M.: Overview of the regulatory network of plant seed development (SeeDev) task at the BioNLP shared task 2016. In: Bionlp Shared Task Workshop - Association for Computational Linguistics, pp. 1–11 (2017)
Bengio, Y., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(6), 1137–1155 (2003)
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1025–1035 (2014)
Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2195–2201 (2015)
Collobert, R., Weston, J., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(1), 2493–2537 (2011)
Socher, R., Lin, C.Y., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: International Conference on Machine Learning, ICML 2011, pp. 129–136 (2011)
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)
Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 2014(2), 1–6 (2014)
Li, C., Rao, Z., Zhang, X.: LitWay, discriminative extraction for different bio-events. In: Bionlp Shared Task Workshop, pp. 32–41 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013a)
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Meeting of the Association for Computational Linguistics, pp. 1555–1565 (2014)
Jiang, Z., Li, L., Huang, D., Jin, L.: Training word embeddings for deep learning in biomedical text mining tasks. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 625–628 (2015)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Li, L., Jiang, Z., Liu, Y., Huang, D.: Word representation on small background texts. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds.) SMP 2016. CCIS, vol. 669, pp. 143–150. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-2993-6_12
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)
Mehryary, F., Björne, J., Pyysalo, S., Salakoski, T., Ginter, F.: Deep learning with minimal training data: TurkuNLP entry in the BioNLP shared task 2016. In: Bionlp Shared Task Workshop, pp. 73–81 (2016)
Li, L., Qin, M., Huang, D.: Biomedical event trigger detection based on hybrid methods integrating word embeddings. In: Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., Ruan, T. (eds.) CCKS 2016. CCIS, vol. 650, pp. 67–79. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-3168-7_7
Hinton, G.E., McClelland, J., Rumelhart, D.E.: Distributed representations. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 1, 77–109 (1986)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 323(6088), 533–536 (1986)
Sagae, K., Tsujii, J.I.: Dependency parsing and domain adaptation with LR models and parser ensembles. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 1044–1050 (2007)
Li, L., Zheng, J., Wan, J., Huang, D., Lin, X.: Biomedical event extraction via long short term memory networks along dynamic extended tree. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 739–742 (2016)
Pyysalo, S., Ohta, T., Miwa, M., Cho, H.C., Tsujii, J., Ananiadou, S.: Event extraction across multiple levels of biological organization. Bioinformatics 28(18), 575–581 (2012)
Li, L., Liu, S., Qin, M., Wang, Y., Huang, D.: Extracting biomedical event with dual decomposition integrating word embeddings. Trans. Comput. Biol. Bioinform. 13, 669–677 (2015)
Acknowledgment
The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China under No. 61672126.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Li, L., Wan, J., Huang, D. (2018). Biomedical Domain-Oriented Word Embeddings via Small Background Texts for Biomedical Text Mining Tasks. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)