Correction of Kazakh synthetic text using finite state automata

Kartbayev A. Mamyrbayev O. Khairova N. Ybytayeva G. Abilkaiyr N. Mussayeva D.
30 November 2021 Little Lion Scientific

Journal of Theoretical and Applied Information Technology
2021 #99 Issue 22 5559 - 5570 pp.

In this paper we investigate the correction of generated synthetic text for resource-poor languages. In most cases, this synthetic text contains many errors that need to be carefully checked and corrected by additional tools. These errors must be corrected automatically to avoid degrading the performance of the system. Our approach to automatic error correction is based on the use of finite automata to suggest candidates for correction of the misspelled word. After selecting correction candidates, a language model is used to assign points to the correction candidates and choose the best correction in a given context. The proposed approach is language-independent and requires only dictionary and text data to construct the language model. The approach was evaluated in Kazakh and achieved an accuracy of 91%.

Finite State Automata , Hidden Markov Model , Language Model , Synthetic Data , Text Generation

Text of the article Перейти на текст статьи

Institute of Information and Computer Technologies, Kazakhstan
Department of Computer Sciences, Al-Farabi Kazakh National University, Kazakhstan
National Technical University, “Kharkiv Polytechnic Institute”, Ukraine
Department of Cybersecurity, Information Processing and Storage, Satbayev University, Kazakhstan
High Schoool of Economics and Business, Al-Farabi Kazakh National University, Kazakhstan

Institute of Information and Computer Technologies
Department of Computer Sciences
National Technical University
Department of Cybersecurity
High Schoool of Economics and Business

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026