Integrated end-to-end multilingual method for low-resource agglutinative languages using Cyrillic scripts

Bekarystankyzy A. Razaque A. Mamyrbayev O.
January 2025 Elsevier B.V.

Journal of Industrial Information Integration
2025 #43

Millions of individuals across the world use automatic speech recognition (ASR) systems every day to dictate messages, operate gadgets, begin searches, and enable data entry in tiny devices. The engagement in these circumstances is determined by the accuracy of the voice transcriptions and the systems response. A second barrier to natural engagement for multilingual users is the monolingual nature of many ASR systems, which limit users to a single predefined language. A substantial amount of transcribed audio data must be used to train an ASR model to obtain one that is trustworthy and accurate. The absence of this data type affects a large number of languages, particularly agglutinative languages. Much research has been conducted using various strategies to improve models for low-resource languages. This study presents an integrated end-to-end multi-language ASR (EMASR) architecture that allows users to choose from a variety of spoken language combinations. The proposed EMASR presents an integrated design to support low-resource agglutinative languages by fusing the features of the multi-identifier module, voice fusion module, and recurrent neural network module. The proposed EMSAR identifies Turkic agglutinative languages (Kazakh, Bashkir, Kyrgyz, Saha, and Tatar) to enable multilingual training through the use of Connectionist Temporal Classification (CTC) and an attention mechanism that includes a language model (LM). The cognate word, sentence construction principles, and an alphabet are all present in these languages (Cyrillic). We use recent advancements in language identification to obtain recognition accuracy and latency characteristics. Experiment results reveal that multilingual training produces superior results than monolingual training in all languages tested. The Kazakh language obtained a spectacular result: word error rate (WER) was reduced to half and character error rate (CER) was reduced to one-third, demonstrating that this strategy may be beneficial for critically low-resource languages.

Agglutinative , Attention-based , Conformer , CTC , Low-resource languages , Multilingual learning

Text of the article Перейти на текст статьи

Narxoz University, Almaty, Kazakhstan
School of Computing, Gachon University, South Korea
Institute of Information and Computational Technologies CS MES RK, 28 Shevchenko Str., Almaty, Kazakhstan

Narxoz University
School of Computing
Institute of Information and Computational Technologies CS MES RK

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026