Abstract
In this paper we consider the approach of automatic extraction of domain keywords from the Kazakh Text based on statistical methods of natural language processing. The proposed approach can be used to build domain dictionaries and thesauri without manual work of domain experts. Results of experiments on a corpus of texts from a Kazakh book and online websites demonstrate that applying latent semantic analysis to keywords extraction significantly decreases information noise and strengthens the words relations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bourigault, D., Jacquemin, C.: Term extraction+term clustering: an integrated platform for computer-aided terminology. In: Proceedings of the EACL (1999)
Xu, F., et al.: A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with bootstrapping. In: LREC (2002)
Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)
Kozakov, L., Park, Y., Fin, T., Drissi, Y., Doganata, Y., Cofino, T.: Glossary extraction and utilization in the information search and delivery system for IBM technical support. IBM Syst. J. 43(3), 546–563 (2004)
Wermter, J., Hahn U.: Finding new terminology in very large corpora. In: Proceedings of the K-CAP 2005, Banff, Alberta, Canada, October 2-5 2005
Oliver, A., Vazquez, M.: TBXTools: a free, fast and flexible tool for automatic terminology extraction. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), pp. 473–479 (2015)
Yessenbayev, Z., Karabalayeva, M., Sharipbayev, A.: Formant analysis and mathematical model of Kazakh vowels. In: International Conference on Computer Modeling and Simulation (UKSIM), pp. 427–431 (2012)
Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the kazakh language. In: Proceedings of the International Conference Turkic Languages Processing TURKLANG-2015, Kazan, Tatarstan, Russia, 17-19 September, pp. 91–100 (2015). (in Russian)
Sundetova, A., Tukeyev, U.: Automatic Detection of the Type of Chunks in Extracting Chunker Translation Rules from Parallel Corpora, Mevlana University, Konya, Turkey (2016)
Church, W.K., Hanks, P.: Word association norms, mutual information and lexicography. In: The 27th Meeting of the Association of Computational Linguistics, pp. 76–83 (1989)
Church, W.K., Gale, A.W.: Concordance for parallel text. In: The 7th Annual Conference of the UW Centre for New OED and Text Research, pp. 40–62. Oxford (1991)
Lin, D.: Extracting collocations from text corpora. In: Workshop on Computational Terminology, pp. 57–63. Montreal, Canada (1998)
Nugumanova, A., Bessmertny, I.: Applying the latent semantic analysis to the issue of automatic extraction of collocations from the domain texts. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 92–101. Springer, Heidelberg (2013)
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval, p. 181. Cambridge University Press, Cambridge (2009)
Acknowledgements
In this work for tokenization and lemmatization of texts we used the morphological analyzer, courtesy of our colleague Kairat Koibagarov from Institute of Informational and Computational Technologies of Science Committee Ministry of Education and Science of Republic Kazakhstan, for which we express him our gratitude. Also this research work is doing in frame of project 5033/GF4 financed by MES of RK.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Alimzhanov, Y., Mansurova, M. (2016). An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text. In: Nguyen, N., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9876. Springer, Cham. https://doi.org/10.1007/978-3-319-45246-3_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-45246-3_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45245-6
Online ISBN: 978-3-319-45246-3
eBook Packages: Computer ScienceComputer Science (R0)