The solution of the problem of unknown words under neural machine translation of the Kazakh language
Turganbayeva A. Tukeyev U.
2021Taylor and Francis Ltd.
Journal of Information and Telecommunication
2021#5Issue 2214 - 225 pp.
The paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh language is an algorithm proposed for searching for unknown words in the dictionary of a trained model of NMT and using the dictionary of synonyms of the Kazakh to replace an unknown word with a word that is close in meaning. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. Software solutions to the unknown word problem have been developed in the Python. The proposed technology solution to the problem of unknown words was tested on the two parallel Kazakh-English corpus in both variants: baseline NMT and NMT with using of the proposed technology.
Kazakh language , Neural machine translation , unknown words
Text of the article Перейти на текст статьи
Department of Information Systems, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Department of Information Systems
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026