Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation


Rakhimova D. Duisenbekkyzy Z. Adali E.
August 2025Multidisciplinary Digital Publishing Institute (MDPI)

Applied Sciences (Switzerland)
2025#15Issue 16

This study focuses on the development and evaluation of automatic speech recognition (ASR) systems for Kazakh child speech, an underexplored domain in both linguistic and computational research. A specialized acoustic corpus was constructed for children aged 2 to 8 years, incorporating age-related vocabulary stratification and gender variation to capture phonetic and prosodic diversity. The data were collected from three sources: a custom-designed Telegram bot, high-quality Dictaphone recordings, and naturalistic speech samples recorded in home and preschool environments. Four ASR models, Whisper, DeepSpeech, ESPnet, and Vosk, were evaluated. Whisper, ESPnet, and DeepSpeech were fine-tuned on the curated corpus, while Vosk was applied in its standard pretrained configuration. Performance was measured using five evaluation metrics: Word Error Rate (WER), BLEU, Translation Edit Rate (TER), Character Similarity Rate (CSRF2), and Accuracy. The results indicate that ESPnet achieved the highest accuracy (32%) and the lowest WER (0.242) for sentences, while Whisper performed well in semantically rich utterances (Accuracy = 33%; WER = 0.416). Vosk demonstrated the best performance on short words (Accuracy = 68%) and yielded the highest BLEU score (0.600) for short words. DeepSpeech showed moderate improvements in accuracy, particularly for short words (Accuracy = 60%), but faced challenges with longer utterances, achieving an Accuracy of 25% for sentences. These findings emphasize the critical importance of age-appropriate corpora and domain-specific adaptation when developing ASR systems for low-resource child speech, particularly in educational and therapeutic contexts.

ASR , child speech recognition , DeepSpeech , ESPnet , fine-tuning , Kazakh language , low-resource languages , Vosk , Whisper

Text of the article Перейти на текст статьи

Department of Information Systems, Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
Institute of Information and Computational Technologies, Almaty, 050010, Kazakhstan
Department of Computer Engineering, Faculty of Computer and Informatics, Istanbul Technical University, Maslak Campus, Istanbul, 34467, Turkey

Department of Information Systems
Institute of Information and Computational Technologies
Department of Computer Engineering

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026