Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language

Rakhimova, Diana; Turarbek, Asem; Kopbosyn, Leila

doi:10.1007/978-981-16-1685-3_12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1371))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

646 Accesses
3 Citations

Abstract

In this paper authors propose a hybrid approach for semantic analysis of text resources and documents in the Kazakh language. An overview and difficulties of analysis for the Kazakh language are presented. The developed approach consists of two main parts. The first definition of keywords (phrases) from the text, and the second, based on the data obtained, will build an annotated summarization of the text. To implement the first part of the approach, the TF-IDF algorithm was applied to extract keywords and phrases from texts. The cosine similarity of the sentence data in the Kazakh language was calculated to determine the similarity. With the help of certain similarities semantic links in the text are determined. On the basis of the data obtained, the second part is performed - the abstraction of texts. The number of annotations directly depends on the size of the document. The linguistic corpus of the Kazakh language was collected for carrying out experiments and calculations. A study of various approaches and a hybrid approach for the semantic analysis of the Kazakh language was carried out. The practical part was implemented in Python. The article presents the results of experimental calculations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language

An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

Using Annotated Suffix Tree Similarity Measure for Text Summarisation

References

Pospelov, D.A.: Ten hotspots in research on artificial intelligence intelligent systems (MSU). (Resource language – Russian), vol. 1, no. 1–4, pp. 47–56 (1996)
Google Scholar
Semantic: https://semantick.ru/. Accessed 14 July 2020
Tomita parser: https://api.yandex.ru/tomita/. Accessed 14 July 2020
In the foothills of semantics: https://dworq.com/: 05/29/2020. 5. AI Data Analysis Technologies for Business. https://www.summarizebot.com/summarization_business.html. Accessed 27 May 2020
TextAnalyst ver. 2.0: Program for personal text analysis. https://offext.ru/library/data/datakeeping/51.aspx. Accessed 19 Apr 2020
Galaktika-Zoom: analytical system for respectable clients. https://www.itweek.ru/themes/detail.php?ID=52215. Accessed 16 June 2020
Best Out-Of-The-Box Sentiment Analysis Tools. https://monkeylearn.com/blog/sentiment-analysis-tools/. Accessed 25 July 2020
Automatic text analysis technologies (resource language – Russian). https://nlp.isa.ru/. Accessed 26 Apr 2020
GitHub Natasha. https://github.com/natasha. Accessed 26 Apr 2020
Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)
Google Scholar
Cicekli, I., Korkmaz, T.: Generation of simple Turkish sentences with systemic-functional grammar. https://doi.org/10.3115/1603899.1603928
Manning, Ch.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. University Press, Cambridge, p. 210 (2008)
Google Scholar
Efficient estimation of word representations in vector space. https://arxiv.org/pdf/1301.3781.pdf. Accessed 10 July 2020
Word2vec parameter learning explained. https://arxiv.org/pdf/1411.2738.pdf. Accessed 10 July 2018
Texts in, meaning out: neural language models in semantic similarity tasks for Russian. https://arxiv.org/ftp/arxiv/papers/1504/1504.08183.pdf. Accessed 20 Apr 2020
Sheremeteva, S.O., Osminin, P.G.: Methods and models for automatic keyword extraction (resource language – Russian). Bull. S. Ural State Univ. №. 1, T. 12, pp. 76–81 (2015)
Google Scholar
Effective approaches for extraction of keywords. https://www.ijcsi.org/papers/7-6-144-148.pdf. Accessed 25 July 2019
Keyword extraction a review of methods and approaches. https://langnet.uniri.hr/papers/beliga/Beliga_KeywordExtraction_a_review_of_methods_and_approaches.pdf. Accessed 05 July 2019
Nastase, V.: Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 763–772 (2008)
Google Scholar
García-Hernández, R., Montiel, R., Ledeneva, Y., Rendón, E., Gelbukh, A., Cruz, R.: Text summarization by sentence extraction using unsupervised learning. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 133–143. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88636-5_12
Chapter Google Scholar
Miller, G.A.: Wordnet: A lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Advances in Automatic Text Summarization, pp. 111–121 (1999)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. Association for Computational Linguistics (2004)
Google Scholar
Parveen, D., Strube, M.: Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 1298–1304 (2015)
Google Scholar
Khatri, C., Singh, G., Parikh, N.: Abstractive and extractive text summarization using document context vector and recurrent neural networks (2018). https://arxiv.org/abs/1807.08000
Zeng, B., Xu, R., Yang, Һ, Gan, Z., Zhou, W.: Comprehensive document summarization with refined self-matching mechanism. Appl. Sci. 10, 1864 (2020). https://doi.org/10.3390/app10051864
Article Google Scholar
TF-IDF. https://en.wikipedia.org/wiki/Tf%E2%80%93idf. Accessed 15 July 2020
Hanumanthappa, M., Narayana, S.M., Jyothi, N.M.: Automatic keyword extraction from dravidian language. Int. J. Innov. Sci. Eng. Technol. 1(8), 87–92 (2014)
Google Scholar
Rakhimova, D., Turganbayeva, A.: Auto-abstracting of texts in the Kazakh Language. In: Proceedings of the 6th International Conference on Engineering & MIS, pp. 1–5 (2020). https://doi.org/10.1145/3410352.3410832
Diana, R., Assem, S.: Problems of semantics of words of the Kazakh language in the information retrieval. In: Nguyen, N.T., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds.) ICCCI 2019. LNCS (LNAI), vol. 11684, pp. 70–81. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28374-2_7
Chapter Google Scholar

Download references

Acknowledgments

This research is funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant No. AP08052421 Project title: «Research and development of the post-editing system o of the Kazakh language in machine translation»).

Author information

Authors and Affiliations

Al-Farabi Kazakh National University, Almaty, Kazakhstan
Diana Rakhimova, Asem Turarbek & Leila Kopbosyn

Authors

Diana Rakhimova
View author publications
You can also search for this author in PubMed Google Scholar
Asem Turarbek
View author publications
You can also search for this author in PubMed Google Scholar
Leila Kopbosyn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Diana Rakhimova or Asem Turarbek .

Editor information

Editors and Affiliations

National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Wrocław University of Science and Technology, Wrocław, Poland
Krystian Wojtkiewicz
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Rathachai Chawuthai
Kielce University of Technology, Kielce, Poland
Pawel Sitek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rakhimova, D., Turarbek, A., Kopbosyn, L. (2021). Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language. In: Hong, TP., Wojtkiewicz, K., Chawuthai, R., Sitek, P. (eds) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2021. Communications in Computer and Information Science, vol 1371. Springer, Singapore. https://doi.org/10.1007/978-981-16-1685-3_12

Download citation

DOI: https://doi.org/10.1007/978-981-16-1685-3_12
Published: 06 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1684-6
Online ISBN: 978-981-16-1685-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language

Abstract

Access this chapter

Similar content being viewed by others

Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language

An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

Using Annotated Suffix Tree Similarity Measure for Text Summarisation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language

Abstract

Access this chapter

Similar content being viewed by others

Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language

An Approach of Automatic Extraction of Domain Keywords from the Kazakh Text

Using Annotated Suffix Tree Similarity Measure for Text Summarisation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation