Abstract
Aiming at solving the problem that the single level back-of-the-book index system is not enough to fully explore the semantics relations between the index terms, a method to extract the hierarchical relations between the index terms based on combination of lexical-syntactic analysis and text structure features is proposed in this paper. It first organizes index terms according to the text structure features, and constructs the indexed term pairs with hierarchical relations step by step. Then based on word vectors, the semantic similarity of paired index terms is calculated to eliminate the misidentified pairs. Finally, the index term pairs with hierarchical relations are optimized in the direct graph to remove redundant and conflict relations, and the hierarchical index system is built at last. Compared with the other results, our method improves precision rate and F value by 11.44% and 5.65% respectively.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Guo, L.F., Wen, G.Q.: Comparative research of index software between English and Chinese. Library 4, 47–48 (2010)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. In: Web Semantics Science Services & Agents on the World Wide Web, vol. 6(3), pp. 203–217 (2008)
Rebele, T., Suchanek, F., Hoffart, J., Biega, J., Kuzey, E., Weikum, G.: YAGO: a multilingual knowledge base from Wikipedia, Wordnet, and Geonames. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 177–185. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_19
Tian, F., Ren, F.: Hyponymy acquisition from Chinese text by SVM. In: International Conference on Natural Language Processing & Knowledge Engineering, Dalian, pp. 1–6. IEEE (2009)
Wang, S., Liang, C., Wu, Z., et al.: Concept hierarchy extraction from textbooks. In: ACM Symposium on Document Engineering, pp. 147–156. ACM (2015)
Sang, E.T.K., Hofmann, K., de Rijke, M.: Extraction of hypernymy information from text∗. In: van den Bosch, A., Bouma, G. (eds.) Interactive Multi-modal Question-Answering. NLP, pp. 223–245. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17525-1_10
Tang, Q., Lv, X.Q., Li, Z.: Research on domain ontology concept hyponymy relation extraction. Microelectron. Comput. 31(6), 68–71 (2014)
Ruan, D.R., He, X.Y., Li, D.Y.: Modeling and extracting hyponymy relationships on Chinese electric power field content. In: 8th International Conference on Modelling, Identification and Control (ICMIC), Algiers, pp. 439–443. IEEE (2016)
Jing, C., Bo, X., et al.: A research on internal hierarchical topic organization model of the book based on hLDA. Libr. Inf. Serv. 60(18), 140–148 (2016)
Wu, Z.H., Li, Z.H., Mitra, P., et al.: Can back-of-the-book indexes be automatically created? In: CIKM 2013 Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, pp. 1745–1750. ACM (2013)
Tian, M., Li, N., et al.: Extraction of index terms for Chinese books. Comput. Eng. Des. 40(1), 261–267 (2019)
Liu, L., Cao, C.G.: Hyponymy relation verification method based on hybrid features. Comput. Eng. 34(14), 12–13 (2008)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv:1301.3781v3, pp. 1–12 (2013)
Lv, S.Q.: Research on the Method of Automatically Generating Back-of-the-Book Index. Beijing Information Science & Technology University, Beijing (2017)
Acknowledgements
This paper was supported by the National Natural Science Foundation of China - The Intelligent Analysis and Optimization Method for Re-flowable Documents (61672105) and the National Key R&D Program of China (2018YFB1004100).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, N., Tian, M., Lv, S. (2020). Extracting Hierarchical Relations Between the Back-of-the-Book Index Terms. In: Hong, JF., Zhang, Y., Liu, P. (eds) Chinese Lexical Semantics. CLSW 2019. Lecture Notes in Computer Science(), vol 11831. Springer, Cham. https://doi.org/10.1007/978-3-030-38189-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-38189-9_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38188-2
Online ISBN: 978-3-030-38189-9
eBook Packages: Computer ScienceComputer Science (R0)