Skip to main content

An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9876))

Included in the following conference series:

Abstract

In this paper we consider the approach of automatic extraction of domain keywords from the Kazakh Text based on statistical methods of natural language processing. The proposed approach can be used to build domain dictionaries and thesauri without manual work of domain experts. Results of experiments on a corpus of texts from a Kazakh book and online websites demonstrate that applying latent semantic analysis to keywords extraction significantly decreases information noise and strengthens the words relations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bourigault, D., Jacquemin, C.: Term extraction+term clustering: an integrated platform for computer-aided terminology. In: Proceedings of the EACL (1999)

    Google Scholar 

  2. Xu, F., et al.: A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with bootstrapping. In: LREC (2002)

    Google Scholar 

  3. Collier, N., Nobata, C., Tsujii, J.: Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7(2), 239–257 (2002)

    Article  Google Scholar 

  4. Kozakov, L., Park, Y., Fin, T., Drissi, Y., Doganata, Y., Cofino, T.: Glossary extraction and utilization in the information search and delivery system for IBM technical support. IBM Syst. J. 43(3), 546–563 (2004)

    Article  Google Scholar 

  5. Wermter, J., Hahn U.: Finding new terminology in very large corpora. In: Proceedings of the K-CAP 2005, Banff, Alberta, Canada, October 2-5 2005

    Google Scholar 

  6. Oliver, A., Vazquez, M.: TBXTools: a free, fast and flexible tool for automatic terminology extraction. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), pp. 473–479 (2015)

    Google Scholar 

  7. Yessenbayev, Z., Karabalayeva, M., Sharipbayev, A.: Formant analysis and mathematical model of Kazakh vowels. In: International Conference on Computer Modeling and Simulation (UKSIM), pp. 427–431 (2012)

    Google Scholar 

  8. Tukeyev, U.: Automaton models of the morphology analysis and the completeness of the endings of the kazakh language. In: Proceedings of the International Conference Turkic Languages Processing TURKLANG-2015, Kazan, Tatarstan, Russia, 17-19 September, pp. 91–100 (2015). (in Russian)

    Google Scholar 

  9. Sundetova, A., Tukeyev, U.: Automatic Detection of the Type of Chunks in Extracting Chunker Translation Rules from Parallel Corpora, Mevlana University, Konya, Turkey (2016)

    Google Scholar 

  10. Church, W.K., Hanks, P.: Word association norms, mutual information and lexicography. In: The 27th Meeting of the Association of Computational Linguistics, pp. 76–83 (1989)

    Google Scholar 

  11. Church, W.K., Gale, A.W.: Concordance for parallel text. In: The 7th Annual Conference of the UW Centre for New OED and Text Research, pp. 40–62. Oxford (1991)

    Google Scholar 

  12. Lin, D.: Extracting collocations from text corpora. In: Workshop on Computational Terminology, pp. 57–63. Montreal, Canada (1998)

    Google Scholar 

  13. Nugumanova, A., Bessmertny, I.: Applying the latent semantic analysis to the issue of automatic extraction of collocations from the domain texts. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 92–101. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval, p. 181. Cambridge University Press, Cambridge (2009)

    MATH  Google Scholar 

Download references

Acknowledgements

In this work for tokenization and lemmatization of texts we used the morphological analyzer, courtesy of our colleague Kairat Koibagarov from Institute of Informational and Computational Technologies of Science Committee Ministry of Education and Science of Republic Kazakhstan, for which we express him our gratitude. Also this research work is doing in frame of project 5033/GF4 financed by MES of RK.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yermek Alimzhanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Alimzhanov, Y., Mansurova, M. (2016). An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text. In: Nguyen, N., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9876. Springer, Cham. https://doi.org/10.1007/978-3-319-45246-3_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45246-3_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45245-6

  • Online ISBN: 978-3-319-45246-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics