Development of an augmented Damerau-Levenshtein method for correcting spelling errors in Kazakh texts
Mukazhanov N. Alibiyeva Z. Yerimbetova A. Kassymova A. Alibiyeva N.
2023Technology Center
Eastern-European Journal of Enterprise Technologies
2023#5Issue 2(125)23 - 33 pp.
The presented paper is devoted to the development of a method for identifying and correcting spelling errors in Kazakh texts. In this paper, the study object is methods for more accurate correction of spelling errors in Kazakh texts. The aim of the study is to develop an augmented version of the Damerau-Levenshtein method for correcting spelling errors in Kazakh language texts. Automatic detection and correction of spelling errors have become a default feature in modern text editors for working with text data, in text messaging applications such as chatbots, messengers, etc. However, although this task is well solved in geographically widespread languages, it has not been fully solved in languages with a small audience, such as the Kazakh language. The methods developed so far cannot correct all spelling errors found in Kazakh texts. Therefore, the development of a method with specific algorithms for spelling error correction in Kazakh texts is considered. As a result of the research work, algorithms for correcting errors found in Kazakh language texts were developed, and the developed algorithms were included in the Damerau-Levenshtein method. The experimental testing results of the augmented Damerau-Levenshtein method showed 97.2 % accuracy in correcting specific errors found only in Kazakh words and 92.8 % accuracy in correcting common errors from letter symbols. The standard Damerau-Levenshtein method testing results showed 76.4 % accuracy in correcting specific errors found only in Kazakh words. The results of the tests in correcting common errors from letter symbols with the standard Damerau-Levenshtein were approximately the same with the augmented Damerau-Levenshtein method, the accuracy is 92.2 %. The extent and conditions of practical application of the results are implemented by including them in text editors, messengers, e-mails and similar applications that work with text data Copyright
algorithm , edit distance , NLP , probability , similarity , spelling error , text data
Text of the article Перейти на текст статьи
Department of Software Engineering, Satbayev University, Satpayev str., 22a, Almaty, 050013, Kazakhstan
Institute of Information and Computational Technologies, Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan, Shevchenko str., 28, Almaty, 050010, Kazakhstan
Institute of Automation and Information Technologies, Satbayev University, Satpayev str., 22a, Almaty, 050013, Kazakhstan
Al-Farabi Kazakh National University, Al-Farabi ave., 71, Almaty, 050040, Kazakhstan
Department of Software Engineering
Institute of Information and Computational Technologies
Institute of Automation and Information Technologies
Al-Farabi Kazakh National University
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026