Development of a Children’s Educational Dictionary for a Low-Resource Language Using AI Tools
Rakhimova D. Karibayeva A. Karyukin V. Turarbek A. Duisenbekkyzy Z. Aliyev R.
October 2024Multidisciplinary Digital Publishing Institute (MDPI)
Computers
2024#13Issue 10
Today, various interactive tools or partially available artificial intelligence applications are actively used in educational processes to solve multiple problems for resource-rich languages, such as English, Spanish, French, etc. Unfortunately, the situation is different and more complex for low-resource languages, like Kazakh, Uzbek, Mongolian, and others, due to the lack of qualitative and accessible resources, morphological complexity, and the semantics of agglutinative languages. This article presents research on early childhood learning resources for the low-resource Kazakh language. Generally, a dictionary for children differs from classical educational dictionaries. The difference between dictionaries for children and adults lies in their purpose and methods of presenting information. A themed dictionary will make learning and remembering new words easier for children because they will be presented in a specific context. This article discusses developing an approach to creating a thematic children’s dictionary of the low-resource Kazakh language using artificial intelligence. The proposed approach is based on several important stages: the initial formation of a list of English words with the use of ChatGPT; identification of their semantic weights; generation of phrases and sentences with the use of the list of semantically related words; translation of obtained phrases and sentences from English to Kazakh, dividing them into bigrams and trigrams; and processing with Kazakh language POS pattern tag templates to adapt them for children. When the dictionary was formed, the semantic proximity of words and phrases to the given theme and age restrictions for children were taken into account. The formed dictionary phrases were evaluated using the cosine similarity, Euclidean similarity, and Manhattan distance metrics. Moreover, the dictionary was extended with video and audio data by implementing models like DALL-E 3, Midjourney, and Stable Diffusion to illustrate the dictionary data and TTS (Text to Speech) technology for the Kazakh language for voice synthesis. The developed thematic dictionary approach was tested, and a SUS (System Usability Scale) assessment of the application was conducted. The experimental results demonstrate the proposed approach’s high efficiency and its potential for wide use in educational purposes.
artificial intelligence , ChatGPT , children’s education , Kazakh language , low-resource language
Text of the article Перейти на текст статьи
Department of Information Systems, Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
Institute of Information and Computational Technologies, Almaty, 050010, Kazakhstan
Department of Information Systems
Institute of Information and Computational Technologies
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026