Skip to main content

Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language

  • Conference paper
  • First Online:
Book cover Computational Collective Intelligence (ICCCI 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Included in the following conference series:

Abstract

In this paper authors propose a hybrid approach for extracting keywords and keyphrases of text resources and documents in Kazakh. Direct application of the statistical method tf-idf is not the optimal solution to the question of extracting keywords and phrases in the Kazakh language, since the Kazakh language is an agglutinative type of language. The authors developed and used the stemming algorithm in the pre-processing process taking into account the grammatical features of the Kazakh language. In the extraction, we also take into account the syntactic feature of the words or phrases using the morphological analyzer of the Kazakh language. During extraction, the restrictions indicated by the authors are observed as well, as not all words may be key words. When choosing keywords or a phrase, their features are considered (for example, some words that are a numeral name in combination with a noun are selected). The extraction of keywords and phrases specifically for the Kazakh language is an urgent task in classification, clustering, abstracting the text, and searching the information. The results of the research indicate that the presented approach is the best solution on extracting keywords and phrases from texts in the Kazakh language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sheremeteva, S.O., Osminin, P.G.: Methods and models for automatic keyword extraction (resource language – Russian). Bull. South Ural State Univ. 1(12), 76–81 (2015)

    Google Scholar 

  2. Effective Approaches for Extraction of Keywords. http://www.ijcsi.org/papers/7-6-144-148.pdf. Accessed 25 July 2019

  3. Keyword extraction a review of methods and approaches. http://langnet.uniri.hr/papers/beliga/Beliga_KeywordExtraction_a_review_of_methods_and_approaches.pdf. Accessed 05 July 2019

  4. Keyword extraction. https://en.wikipedia.org/wiki/Keyword_extraction. Accessed 16 June 2019

  5. Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., Wang, B.: Automatic keyword extraction from documents using conditional random fields. J. CIS 4(3), 1169–1180 (2008)

    Google Scholar 

  6. Chen, P., Lin, S.: Automatic keyword prediction using Google similarity distance. Expert Syst. Appl. 37(3), 1928–1938 (2010)

    Article  Google Scholar 

  7. Kim, S.N., Baldwin, T., Kan, M.-Y.: An unsupervised approach to domain-specific term extraction. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 94–98 (2009)

    Google Scholar 

  8. Ngomo, N.A.-C., Křemen, P.: Knowledge engineering and semantic web. In: Proceedings of the 7th International Conference, KESW 2016, Prague, Czech Republic, pp. 104–109 (2016)

    Google Scholar 

  9. Lopes, L., Fernandes, P., Vieira, R.: Estimating term domain relevance through term frequency, disjoint corpora frequency-tf-dcf. Knowl.-Based Syst. 97, 156–187 (2016)

    Article  Google Scholar 

  10. Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)

    Google Scholar 

  11. Jean-Louis, L., Gagnon, M., Charton, E.: A knowledge-base oriented approach for automatic keyword extraction. Computacion y Sistemas 17(2), 187–196 (2013)

    Google Scholar 

  12. Zhao, Y., Shi, X.: The application of vector space model in the information retrieval system. In: Zhang, W. (ed.) Software Engineering and Knowledge Engineering: Theory and Practice, Advances in Intelligent and Soft Computing, vol. 162, pp. 43–49. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29455-6_6

    Chapter  Google Scholar 

  13. Hanumanthappa, M., Narayana, Swamy M., Jyothi, N.M.: Automatic keyword extraction from dravidian language. Int. J. Innov. Sci. Eng. Technol. 1(8), 87–92 (2014)

    Google Scholar 

  14. Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)

    Google Scholar 

  15. Mihalcea, R., Radev, D.: Graph-Based Natural Language Processing and Information Retrieval, 1st edn, p. 202. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

Download references

Acknowledgments

The study was supported by the Ministry of Education and Science of the Republic of Kazakhstan within the framework of the AP05132950 and AP08052421 scientific projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diana Rakhimova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rakhimova, D., Turganbayeva, A. (2020). Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63007-2_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63006-5

  • Online ISBN: 978-3-030-63007-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics