Skip to main content

Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language

  • Conference paper
  • First Online:
Recent Challenges in Intelligent Information and Database Systems (ACIIDS 2021)

Abstract

In this paper authors propose a hybrid approach for semantic analysis of text resources and documents in the Kazakh language. An overview and difficulties of analysis for the Kazakh language are presented. The developed approach consists of two main parts. The first definition of keywords (phrases) from the text, and the second, based on the data obtained, will build an annotated summarization of the text. To implement the first part of the approach, the TF-IDF algorithm was applied to extract keywords and phrases from texts. The cosine similarity of the sentence data in the Kazakh language was calculated to determine the similarity. With the help of certain similarities semantic links in the text are determined. On the basis of the data obtained, the second part is performed - the abstraction of texts. The number of annotations directly depends on the size of the document. The linguistic corpus of the Kazakh language was collected for carrying out experiments and calculations. A study of various approaches and a hybrid approach for the semantic analysis of the Kazakh language was carried out. The practical part was implemented in Python. The article presents the results of experimental calculations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Pospelov, D.A.: Ten hotspots in research on artificial intelligence intelligent systems (MSU). (Resource language – Russian), vol. 1, no. 1–4, pp. 47–56 (1996)

    Google Scholar 

  2. Semantic: https://semantick.ru/. Accessed 14 July 2020

  3. Tomita parser: https://api.yandex.ru/tomita/. Accessed 14 July 2020

  4. In the foothills of semantics: https://dworq.com/: 05/29/2020. 5. AI Data Analysis Technologies for Business. https://www.summarizebot.com/summarization_business.html. Accessed 27 May 2020

  5. TextAnalyst ver. 2.0: Program for personal text analysis. https://offext.ru/library/data/datakeeping/51.aspx. Accessed 19 Apr 2020

  6. Galaktika-Zoom: analytical system for respectable clients. https://www.itweek.ru/themes/detail.php?ID=52215. Accessed 16 June 2020

  7. Best Out-Of-The-Box Sentiment Analysis Tools. https://monkeylearn.com/blog/sentiment-analysis-tools/. Accessed 25 July 2020

  8. Automatic text analysis technologies (resource language – Russian). https://nlp.isa.ru/. Accessed 26 Apr 2020

  9. GitHub Natasha. https://github.com/natasha. Accessed 26 Apr 2020

  10. Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)

    Google Scholar 

  11. Cicekli, I., Korkmaz, T.: Generation of simple Turkish sentences with systemic-functional grammar. https://doi.org/10.3115/1603899.1603928

  12. Manning, Ch.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. University Press, Cambridge, p. 210 (2008)

    Google Scholar 

  13. Efficient estimation of word representations in vector space. https://arxiv.org/pdf/1301.3781.pdf. Accessed 10 July 2020

  14. Word2vec parameter learning explained. https://arxiv.org/pdf/1411.2738.pdf. Accessed 10 July 2018

  15. Texts in, meaning out: neural language models in semantic similarity tasks for Russian. https://arxiv.org/ftp/arxiv/papers/1504/1504.08183.pdf. Accessed 20 Apr 2020

  16. Sheremeteva, S.O., Osminin, P.G.: Methods and models for automatic keyword extraction (resource language – Russian). Bull. S. Ural State Univ. №. 1, T. 12, pp. 76–81 (2015)

    Google Scholar 

  17. Effective approaches for extraction of keywords. https://www.ijcsi.org/papers/7-6-144-148.pdf. Accessed 25 July 2019

  18. Keyword extraction a review of methods and approaches. https://langnet.uniri.hr/papers/beliga/Beliga_KeywordExtraction_a_review_of_methods_and_approaches.pdf. Accessed 05 July 2019

  19. Nastase, V.: Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 763–772 (2008)

    Google Scholar 

  20. García-Hernández, R., Montiel, R., Ledeneva, Y., Rendón, E., Gelbukh, A., Cruz, R.: Text summarization by sentence extraction using unsupervised learning. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 133–143. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88636-5_12

    Chapter  Google Scholar 

  21. Miller, G.A.: Wordnet: A lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  22. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Advances in Automatic Text Summarization, pp. 111–121 (1999)

    Google Scholar 

  23. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Article  Google Scholar 

  24. Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. Association for Computational Linguistics (2004)

    Google Scholar 

  25. Parveen, D., Strube, M.: Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 1298–1304 (2015)

    Google Scholar 

  26. Khatri, C., Singh, G., Parikh, N.: Abstractive and extractive text summarization using document context vector and recurrent neural networks (2018). https://arxiv.org/abs/1807.08000

  27. Zeng, B., Xu, R., Yang, Òº, Gan, Z., Zhou, W.: Comprehensive document summarization with refined self-matching mechanism. Appl. Sci. 10, 1864 (2020). https://doi.org/10.3390/app10051864

    Article  Google Scholar 

  28. TF-IDF. https://en.wikipedia.org/wiki/Tf%E2%80%93idf. Accessed 15 July 2020

  29. Hanumanthappa, M., Narayana, S.M., Jyothi, N.M.: Automatic keyword extraction from dravidian language. Int. J. Innov. Sci. Eng. Technol. 1(8), 87–92 (2014)

    Google Scholar 

  30. Rakhimova, D., Turganbayeva, A.: Auto-abstracting of texts in the Kazakh Language. In: Proceedings of the 6th International Conference on Engineering & MIS, pp. 1–5 (2020). https://doi.org/10.1145/3410352.3410832

  31. Diana, R., Assem, S.: Problems of semantics of words of the Kazakh language in the information retrieval. In: Nguyen, N.T., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds.) ICCCI 2019. LNCS (LNAI), vol. 11684, pp. 70–81. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28374-2_7

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant No. AP08052421 Project title: «Research and development of the post-editing system o of the Kazakh language in machine translation»).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Diana Rakhimova or Asem Turarbek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rakhimova, D., Turarbek, A., Kopbosyn, L. (2021). Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language. In: Hong, TP., Wojtkiewicz, K., Chawuthai, R., Sitek, P. (eds) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2021. Communications in Computer and Information Science, vol 1371. Springer, Singapore. https://doi.org/10.1007/978-981-16-1685-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1685-3_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1684-6

  • Online ISBN: 978-981-16-1685-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics