Development of the information system for the Kazakh language preprocessing


Akhmed-Zaki D. Mansurova M. Madiyeva G. Kadyrbek N. Kyrgyzbayeva M.
2021Cogent OA

Cogent Engineering
2021#8Issue 1

The aim of this work is the design and development of linguistic resources and preprocessing tools for the Kazakh language. The media-corpus of the Kazakh language is presented as a linguistic resource, which is available on Al-Farabi Kazakh National University platform. The media-corpus of the Kazakh language consists of texts of news content and is implemented as an information system. The general architecture of an information system for the automatic and reliable collection, storage and analysis of texts in the Kazakh language is described. Three automatic text preprocessing tools for the Kazakh language–word forms generator, morphological analyzer, and morphological disambiguation tool–are presented in the article. The proposed tools can also be applied in the systems of automatic analysis of texts, in creation of other linguistic resources such as thesauri and ontologies.

architecture of information system , Corpus linguistics , Kazakh language , morphological parsing , preprocessing tool

Text of the article Перейти на текст статьи

Department of Computer Science, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Astana IT University, Nur-Sultan, Kazakhstan
Department of Artificial Intelligence and Big Data, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Department of General Linguistics and European Languages, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Department of Artificial Intelligence and Big DataAl-Farabi, Kazakh National University, Almaty, Kazakhstan
School of Mechanical Engineering, University of Birmingham, Birmingham, United Kingdom

Department of Computer Science
Astana IT University
Department of Artificial Intelligence and Big Data
Department of General Linguistics and European Languages
Department of Artificial Intelligence and Big DataAl-Farabi
School of Mechanical Engineering

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026