Development of the information system for the Kazakh language preprocessing
Akhmed-Zaki D. Mansurova M. Madiyeva G. Kadyrbek N. Kyrgyzbayeva M.
2021Cogent OA
Cogent Engineering
2021#8Issue 1
The aim of this work is the design and development of linguistic resources and preprocessing tools for the Kazakh language. The media-corpus of the Kazakh language is presented as a linguistic resource, which is available on Al-Farabi Kazakh National University platform. The media-corpus of the Kazakh language consists of texts of news content and is implemented as an information system. The general architecture of an information system for the automatic and reliable collection, storage and analysis of texts in the Kazakh language is described. Three automatic text preprocessing tools for the Kazakh language–word forms generator, morphological analyzer, and morphological disambiguation tool–are presented in the article. The proposed tools can also be applied in the systems of automatic analysis of texts, in creation of other linguistic resources such as thesauri and ontologies.
architecture of information system , Corpus linguistics , Kazakh language , morphological parsing , preprocessing tool
Text of the article Перейти на текст статьи
Department of Computer Science, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Astana IT University, Nur-Sultan, Kazakhstan
Department of Artificial Intelligence and Big Data, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Department of General Linguistics and European Languages, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Department of Artificial Intelligence and Big DataAl-Farabi, Kazakh National University, Almaty, Kazakhstan
School of Mechanical Engineering, University of Birmingham, Birmingham, United Kingdom
Department of Computer Science
Astana IT University
Department of Artificial Intelligence and Big Data
Department of General Linguistics and European Languages
Department of Artificial Intelligence and Big DataAl-Farabi
School of Mechanical Engineering
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026