Development of Technology for Summarization of Kazakh Text
Zhabayev T. Tukeyev U.
2021Science and Information Organization
International Journal of Advanced Computer Science and Applications
2021#12Issue 9111 - 116 pp.
This paper presents the solution to the problem of summarizing Kazakh texts. The problem of Kazakh text summarization is considered as a sequence of two tasks: extracting the most important sentences of the text and simplifying the received sentences. The task of extracting the most important sentences of the text is solved using the TF-IDF method and the task of simplifying sentences is solved using the neural network technology “Seq2Seq”. Problem of using NMT method for simplification of Kazakh was in absence of Kazakh dataset for training. To solve this problem in this work propose use transfer learning method. The use of transfer learning made it possible to use a ready-made model that was trained on a parallel corpus of Simple English Wikipedia and not create a simplification corpus in Kazakh from scratch. For this, a transfer learning technology for simplifying sentences of the Kazakh language has been developed, based on training a neural model for simplifying sentences in the English language. Main scientific contribution of this work is transfer learning technology for the simplification of Kazakh sentences using the parallel corpus of the English language simplification.
low-resource language , seq2seq , Summarization , text simplification , transfer learning
Text of the article Перейти на текст статьи
Department of Information Systems Al-Farabi, Kazakh National University, Almaty, Kazakhstan
Department of Information Systems Al-Farabi
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026