Abstract
It is difficult to use a search engine to acquire knowledge directly due to the complexity of Web resources. In this paper we proposed a method for semantic relation corpus construction of traditional Chinese medicine based on combination of multiple encyclopedias. For a known conceptual pair, we got some search results by automatically constructing search requests based on URLs’ characteristics of the encyclopedia search engine, and used regular expressions to extract meaningful texts from the search results to form semantic relation corpus. The experiment result shows that the precision and recall are 92.1% and 65.3%, respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Liu, Y., Duan, H., Wang, H., Zhou, Y., Wang, Z., Li, H.: Research on corpus creation and development of Chinese traditional medicine. J. Chin. Inf. Process. 22(4), 24–30 (2008). http://doi.org/10.3969/j.issn.1003-0077.2008.04.004. (in Chinese)
Roberts, A., Gaizauskas, R., Hepple, M., et al.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inform. 42(5), 950–966 (2009). http://doi.org/10.1016/j.jbi.2008.12.013
Chapman, W.W., Savova, G.K., Zheng, J., et al.: Anaphoric reference in clinical reports: characteristics of an annotated corpus. J. Biomed. Inform. 45(3), 507–521 (2012). http://doi.org/10.1016/j.jbi.2012.01.010
Yu, Q., Cui, M., Liu, L., Liu, J., Liu, H.: Current status of research on database of TCM medical records. China Digit. Med. 8(3), 71–74 (2013). http://doi.org/10.3969/j.issn.1673-7571.2013.03.023. (in Chinese)
Qu, C., Guan, Y., Yang, J., Zhao, Y., Liu, X.: The construction of annotated corpora of named entities for Chinese electronic medical records. Chin. High Technol. Lett. 02, 143–150 (2015). http://doi.org/10.3772/j.issn.1002-0470.2015.02.005. (in Chinese)
Feng, L.: Automatic Approaches to Develop Large-scale TCM Electronic Medical Record Corpus for Named Entity Recognition Tasks. BeiJing JiaoTong University (2015). (in Chinese)
Li, Z., Liu, F., Antieau, L., Cao, Y., Yu, H.: Lancet: a high precision medication event extraction system for clinical text. J. Am. Med. Inform. Assoc. 17(5), 563 (2010). http://doi.org/10.1136/jamia.2010.004077
Collier, N., Mima, H., Ohta, T., Tateisi, Y., Yakushiji, A.: The GENIA project: knowledge acquisition from biology texts. Genome Inform. 11, 448–449 (2001). http://doi.org/10.11234/gi1990.11.448
Friedman, C., Kra, P., Rzhetsky, A.: Two biomedical sublanguages: a description based on the theories of Zellig Harris. J. Biomed. Inform. 35(4), 222 (2002). http://doi.org/10.1016/S1532-0464(03)00012-1
Yang, J.F., Guan, Y., He, B., Qu, C.Y., Yu, Q.B., Liu, Y.X., Zhao, Y.J.: Corpus construction for named entities and entity relations on Chinese electronic medical records. J. Softw. 27(11), 2725–2746 (2016). (in Chinese)
Yang, Y.: Demonstrative study of semantic relation in comprehensive clinical terminologies of traditional Chinese Medicine. Chinese Academy of traditional Chinese Medicine (2007). (in Chinese)
Bai, L., Zhou, Y., Yue, X.: Thoughts and methods of digital informationization of ancient chinese medicine. J. Tradit. Chin. Med. 05, 12 (2009). (in Chinese)
Zhu, L., Yu, T., Yang, F.: Study on semantic relations discovery based on key verbs in chinese classical medical books. China Digit. Med. 05, 73–75 (2016). http://doi.org/10.3969/j.issn.1673-7571.2016.05.023. (in Chinese)
Wang, S.: Study on Pathogenesis of Traditional Chinese Medicine Symptoms and Its Relationship Mining. Xiamen University (2009). (in Chinese)
Yao, Y., Wang, S., Xu, R., Liu, G., Gui, L., Lu, Q., Wang, X.: The construction of an emotion annotated corpus on microblog text. J. Chin. Inf. Process. 05, 83–91 (2014). http://doi.org/10.3969/j.issn.1003-0077.2014.05.011. (in Chinese)
Han, Z.: Construction of dynamic corpus based on web - a case study of Chinese political news corpus. China Educ. Technol. Equip. 23, 66–68 (2013). http://doi.org/10.3969/j.issn.1671-489X.2013.23.066. (in Chinese)
Cao, X., Cao, C.: A method for acquiring corpus rich in part-whole relation from the web. J. Chin. Inf. Process. 05, 17–23 (2011). http://doi.org/10.3969/j.issn.1003-0077.2011.05.003. (in Chinese)
Hu, H., Yao, T.: Sentence alignment of bilingual verbs based on Wikipedia. J. Chin. Inf. Process. 01, 198–203 (2016). (in Chinese)
Acknowledgments
This work was supported by National Key R&D Program of China (2016YFF0202806), National Natural Science Foundation of China (81403281), project of China National Institute of Standardization (712016Y-4941, 522016Y-4681).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, J. et al. (2017). A Construction Method for the Semantic Relation Corpus of Traditional Chinese Medicine. In: Huang, TC., Lau, R., Huang, YM., Spaniol, M., Yuen, CH. (eds) Emerging Technologies for Education. SETE 2017. Lecture Notes in Computer Science(), vol 10676. Springer, Cham. https://doi.org/10.1007/978-3-319-71084-6_65
Download citation
DOI: https://doi.org/10.1007/978-3-319-71084-6_65
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71083-9
Online ISBN: 978-3-319-71084-6
eBook Packages: Computer ScienceComputer Science (R0)