An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

Alimzhanov, Yermek; Mansurova, Madina

doi:10.1007/978-3-319-45246-3_53

Yermek Alimzhanov¹⁷ &
Madina Mansurova¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9876))

Included in the following conference series:

International Conference on Computational Collective Intelligence

2047 Accesses
1 Citations

Abstract

In this paper we consider the approach of automatic extraction of domain keywords from the Kazakh Text based on statistical methods of natural language processing. The proposed approach can be used to build domain dictionaries and thesauri without manual work of domain experts. Results of experiments on a corpus of texts from a Kazakh book and online websites demonstrate that applying latent semantic analysis to keywords extraction significantly decreases information noise and strengthens the words relations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bourigault, D., Jacquemin, C.: Term extraction+term clustering: an integrated platform for computer-aided terminology. In: Proceedings of the EACL (1999)
Google Scholar
Xu, F., et al.: A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with bootstrapping. In: LREC (2002)
Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)
Article Google Scholar
Kozakov, L., Park, Y., Fin, T., Drissi, Y., Doganata, Y., Cofino, T.: Glossary extraction and utilization in the information search and delivery system for IBM technical support. IBM Syst. J. 43(3), 546–563 (2004)
Article Google Scholar
Wermter, J., Hahn U.: Finding new terminology in very large corpora. In: Proceedings of the K-CAP 2005, Banff, Alberta, Canada, October 2-5 2005
Google Scholar
Oliver, A., Vazquez, M.: TBXTools: a free, fast and flexible tool for automatic terminology extraction. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), pp. 473–479 (2015)
Google Scholar
Yessenbayev, Z., Karabalayeva, M., Sharipbayev, A.: Formant analysis and mathematical model of Kazakh vowels. In: International Conference on Computer Modeling and Simulation (UKSIM), pp. 427–431 (2012)
Google Scholar
Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the kazakh language. In: Proceedings of the International Conference Turkic Languages Processing TURKLANG-2015, Kazan, Tatarstan, Russia, 17-19 September, pp. 91–100 (2015). (in Russian)
Google Scholar
Sundetova, A., Tukeyev, U.: Automatic Detection of the Type of Chunks in Extracting Chunker Translation Rules from Parallel Corpora, Mevlana University, Konya, Turkey (2016)
Google Scholar
Church, W.K., Hanks, P.: Word association norms, mutual information and lexicography. In: The 27th Meeting of the Association of Computational Linguistics, pp. 76–83 (1989)
Google Scholar
Church, W.K., Gale, A.W.: Concordance for parallel text. In: The 7th Annual Conference of the UW Centre for New OED and Text Research, pp. 40–62. Oxford (1991)
Google Scholar
Lin, D.: Extracting collocations from text corpora. In: Workshop on Computational Terminology, pp. 57–63. Montreal, Canada (1998)
Google Scholar
Nugumanova, A., Bessmertny, I.: Applying the latent semantic analysis to the issue of automatic extraction of collocations from the domain texts. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 92–101. Springer, Heidelberg (2013)
Chapter Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval, p. 181. Cambridge University Press, Cambridge (2009)
MATH Google Scholar

Download references

Acknowledgements

In this work for tokenization and lemmatization of texts we used the morphological analyzer, courtesy of our colleague Kairat Koibagarov from Institute of Informational and Computational Technologies of Science Committee Ministry of Education and Science of Republic Kazakhstan, for which we express him our gratitude. Also this research work is doing in frame of project 5033/GF4 financed by MES of RK.

Author information

Authors and Affiliations

Al-Farabi Kazakh National University, Al-Farabi av. 71, 050040, Almaty, Kazakhstan
Yermek Alimzhanov & Madina Mansurova

Authors

Yermek Alimzhanov
View author publications
You can also search for this author in PubMed Google Scholar
Madina Mansurova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yermek Alimzhanov .

Editor information

Editors and Affiliations

Wroclaw University of Technology , Wroclaw, Poland
Ngoc Thanh Nguyen
Aristotle University of Thessaloniki , Thessaloniki, Greece
Lazaros Iliadis
Department of Forestry and Manageme, Democritus University of Thrace Department of Forestry and Manageme, Orestiada Thrace, Greece
Yannis Manolopoulos
Wrocław University of Technology , Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alimzhanov, Y., Mansurova, M. (2016). An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text. In: Nguyen, N., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9876. Springer, Cham. https://doi.org/10.1007/978-3-319-45246-3_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-45246-3_53
Published: 20 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45245-6
Online ISBN: 978-3-319-45246-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics