Abstract
Extracting entity hyponymy in Chinese complex sentences can be a highly difficult process. This paper proposes a novel hybrid approach that combines parsing with supervised learning and semi-supervised learning. First, conditional random fields (CRF) model is employed to obtain the candidate domain named entity. Pattern matching is then used to acquire candidate hyponymy. Next, predicate and symbol features, syntactic analysis, and semantic roles are introduced into the CRF features template to identify the hyponymy entity pairs. Finally, analysis of both the parallel relationship of entities among sentences and entity pairs in simple sentences is conducted to obtain the hyponymy entity pairs in Chinese complex sentences. The experimental results show that the proposed method reduces the manual work required for CRF markers and has an improved overall performance in comparison with the baseline methods.
Similar content being viewed by others
References
Nakaya, N., Kurematsu, M., and Yamaguchi, T., A domain ontology development environment using a MRD and text corpus, Proc. of the Joint Conf. on Knowledge Based Software Engineering, 2002, pp. 242–253.
WordNet: A Lexical Database for English, Princeton University. http://wordnet.princeton.edu/wordnet/.
Li, H., Li, W., Liang, R., et al., Toponym ontology concept semantic relation research based on place name dictionary and thesaurus, China Place Name, 2010, vol. 10, pp. 71–74.
Dong, Z. and Dong, Q., HowNet. http://www.keenage.com/html/c_index.html.
Hearst, M.A., Automatic acquisition of hyponyms from large text corpora, Proceedings of the 14th Conference on Computational Linguistics, 1992, vol. 2, pp. 539–545.
Tuan, L.A., Kim, J., and Kiong, N.S., Taxonomy construction using syntactic contextual evidence, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 810–819.
Bansal, M., Burkett, D., Melo, D.G., et al., Structured learning for taxonomy induction with belief propagation, ACL, 2014, no. 1, pp. 1041–1051.
Wu, J., Luo, B., and Cao, C., Acquisition and verification of mereological knowledge from Web page texts, J. East China Univ. Sci. Technol., 2006, vol. 32, no. 11, p. 1310.
Tang, Q., Lv, X.Q., and Li, Z., Research on domain ontology concept hyponymy relation extraction, Microelectron. Comput., 2014, vol. 6, pp. 68–71.
Tian, F., Yuan, C., and Ren, F., Hyponym extraction from the web by bootstrapping, IEEJ Trans. Electr. Electron. Eng., 2012, vol. 7, no. 1, pp. 62–68.
Fan, M., Zhao, D., Zhou, Q., et al., Distant supervision for relation extraction with matrix completion, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, vol. 1, pp. 839–849.
Xia, F., Cao, X., Fu, J., et al., Extracting part-whole relations based on coordinate structure, J. Chin. Inf. Process., 2015, vol. 29, no. 1, pp. 88–96.
Fu, R., Qin, B., and Liu, T., Exploiting multiple sources for open-domain hypernym discovery, EMNLP, 2013, pp. 1224–1234.
Sang, E.T.K. and Hofmann, K., Lexical patterns or dependency patterns: Which is better for hypernym extraction?, Proceedings of the Thirteenth Conference on Computational Natural Language Learning, 2009, pp. 174–182.
Liu, H., Che, W., and Liu, T., Feature engineering for Chinese semantic role labeling, J. Chin. Inf. Process., 2007, vol. 21, no. 1, pp. 79–84.
Chen, Y., Zheng, Q., and Chen, P., Feature assembly method for extracting relations in Chinese, Artif. Intell., 2015, vol. 228, pp. 179–184.
Zhang, H., NLPIR: Chinese word segmentation system. http://ictclas.nlpir.org/.
Pennacchiotti, M. and Pantel, P., A bootstrapping algorithm for automatically harvesting semantic relations, Proceedings of Inference in Computational Semantics (ICoS-06), 2006, pp. 87–96.
The Research Center for Social Computing and Information Retrieval at Harbin Institute of Technology (HITSCIR): Language Technology Platform. http://www.ltp-cloud.com/.
Mo, Y., Guo, J., Yu, Z., et al., Hyponymy extraction of domain ontology concept based on CCRF, Comput. Eng., 2014, vol. 40, no. 6, pp. 138–141.
Wang, C. and Yang, Z., An acquisition method of domain-specific terminological hyponym based on structure features of sentence, J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Ed.), 2014, vol. 3, p. 19.
Chang, C. and Lin, C., LIBSVM–A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Kudo, T., CRF++: Yet another CRF toolkit. https://taku910.github.io/crfpp/.
Che, W., Li, Z., and Liu, T., LTP: A Chinese language technology platform, Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, 2010, pp. 13–16.
Author information
Authors and Affiliations
Corresponding author
Additional information
The article is published in the original.
About this article
Cite this article
Cheng, Y., Guo, J., Xian, Y. et al. A hybrid method for entity hyponymy acquisition in Chinese complex sentences. Aut. Control Comp. Sci. 50, 369–377 (2016). https://doi.org/10.3103/S0146411616050035
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411616050035