Formalization of Morphological Rules for Kazakh Nouns in the New Latin Alphabet
Zhetkenbay L. Sharipbay A. Razakhova B. Bekmanova G. Barlybayev A. Nazyrova A. Yergesh B.
September 2025Bright Publisher
Journal of Applied Data Sciences
2025#6Issue 31999 - 2019 pp.
This study presents a hybrid computational model for formalizing and predicting morphological inflections of Kazakh nouns written in the new Latin alphabet. The motivation stems from limitations in previous systems based on Cyrillic orthography, which often misrepresented key phonological features such as vowel harmony and consonant assimilation. The main objective is to develop a linguistically informed and computationally efficient system to support Natural Language Processing (NLP) for Kazakh during its transition to Latin script. The methodology combines rule-based grammar formalization with a machine learning approach, specifically a Bayesian Regulation Backpropagation Neural Network (BR-BPNN). A manually curated dataset of 1,000 Latin-script Kazakh nouns was annotated for various morphological forms. Each word was encoded at the character level using a custom dictionary (kazlat_dict), capturing the final four letters as feature vectors. Formal logic and regular expressions were used to model morphological rules such as pluralization and case endings, incorporating vowel harmony, consonant softness, and sonority. These rules provided the training labels for the BR-BPNN model. The trained model achieved 91.5% accuracy, 89.4% precision, and a correlation coefficient (R) above 0.98, confirming the effectiveness of the hybrid system. A user interface prototype was developed to demonstrate practical utility, enabling users to input root nouns and receive suffix predictions with confidence scores and linguistic explanations. The novelty of this work lies in integrating linguistic theory with machine learning for a low-resource Turkic language. It offers a foundation for intelligent Kazakh language tools including spell checkers, grammar correctors, and educational platforms. Future work will extend the system to other parts of speech and explore contextual modeling to improve handling of ambiguous or irregular forms.
Alphabet , Conjunctions , Formal Model , Kazakh Language , Metalanguage , Morphological Rules , Natural Language Processing , Nouns , Sound System , Suffixes
Text of the article Перейти на текст статьи
Department of Artificial Intelligence Technologies, Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Pushkina 11, Astana, 010008, Kazakhstan
Department of Artificial Intelligence Technologies
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026