ALIGNING AND EXTENDING TECHNOLOGIES OF PARALLEL CORPORA FOR THE KAZAKH LANGUAGE
Rakhimova D. Karibayeva A.
2022Technology Center
Eastern-European Journal of Enterprise Technologies
2022#4Issue 2-11832 - 39 pp.
The paper presents the two-stage alignment and extending methods of parallel corpora for the Kazakh language. The Kazakh language is agglutinative with rich morphology and related to the Turkic language group. So, the traditional alignment methods for similar languages do not work for the Kazakh language. The alignment is used primarily to ensure that the fragment corresponding to the original is found in the translation. After that, identical fragments of parallel texts are compared with each other. At the initial stage, the question is what needs to be leveled. It is possible to align word by word, but this often becomes almost impossible for several reasons: sets of lexemes and expressions do not match in different languages. Considering the linguistic peculiarities of languages, the developed technologies and ways of universal alignment of parallel text may not work in languages with agglutination. It means that the form of the word is formed by additional affixes and auxiliary words that carry semantic and morphological information. The approach presented in this paper is to use a two-stage alignment, which uses a bilingual dictionary of synonyms. The evaluation with the use of the English-Kazakh corpus verifies that our method shows an average of 89 % correct alignment. The second method is designed to expand the parallel corpus due to the lack of natural parallel corpora of the Kazakh-English language pair with good quality. The developed method uses a combinatorial method taking into account the semantic and grammatical features of the Kazakh language. Different tenses of the Kazakh language are used for sentence generation, and different endings for parts of speech are also considered
Aligning , English , Extending technology , Kazakh , Parallel corpora , Sentence generation
Text of the article Перейти на текст статьи
Department of Information Systems, Al-Farabi Kazakh National University, Аl-Farabi ave,71, Almaty, 050040, Kazakhstan
Department of Information Systems
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026