Development of a Robust Neural Network- Based VAD System under Low Signal-to-Noise Ratio Conditions

Kulakayeva A. Medetov B. Zhetpisbayeva A. Nurlankyzy A.
8 December 2025 Dr D. Pylarinos

Engineering, Technology and Applied Science Research
2025 #15 Issue 6 30377 - 30386 pp.

This study investigates the problem of developing and evaluating robust Voice Activity Detection (VAD) systems under low Signal-to-Noise Ratio (SNR) conditions, which presents a significant challenge for modern telecommunications and voice interface systems, especially in noisy acoustic environments. This study is important due to the limited investigation of contemporary hybrid neural network architectures for VAD in low-resource languages such as Kazakh, particularly across a wide range of SNR levels, including extreme values below -10 dB. The central research question is which modern hybrid neural network architecture offers the best balance between accuracy and computational efficiency for speech detection in the Kazakh language under severe noise conditions. This study developed and tested five architectures, CNN+BiGRU, CNN+GRU, CNN+LSTM, CNN+BiLSTM, and CNN+TDNN, based on the KSC2 corpus, augmented with synthetic noise across an SNR range from -18 dB to +30 dB, with separate analyses at fixed levels of 10 dB and -10 dB. MFCC features were used as input, and training/testing was performed using noise samples from the ESC-50 dataset. Experimental results demonstrated that the CNN+BiGRU, CNN+GRU, and CNN+LSTM architectures achieved the highest F1-score (99.6%) and maintained robustness at SNR levels above -12 dB, whereas CNN+TDNN provided comparable quality with minimal computational complexity and the shortest training time (164 s). The analysis under fixed SNR levels revealed the limited generalization capabilities of the models when trained on a single noise level, highlighting the necessity of incorporating a wide SNR range in training. In conclusion, the hybrid architectures CNN+BiGRU and CNN+TDNN are recommended for deployment in VAD systems for the Kazakh language in highly noisy environments. Licensed under a CC-BY 4.0 license | Copyright (c) by the authors

BiGRU , BiLSTM , Convolutional Neural Network (CNN) , Low Signal-to-Noise Ratio (SNR) , Recurrent Neural Network (RNN) , Voice Activity Detection (VAD)

Text of the article Перейти на текст статьи

Department of Radio Engineering, Electronics and Telecommunications, International Information Technology University, Almaty, Kazakhstan
Department of Radio Engineering, Electronics and Telecommunications, L. N. Gumilyov Eurasian National University, Astana, Kazakhstan
Department of Electronics, Telecommunications and Space Technologies, Satbayev University, Almaty, Kazakhstan
Department of Cybersecurity, International Information Technology University, Almaty, Kazakhstan

Department of Radio Engineering
Department of Radio Engineering
Department of Electronics
Department of Cybersecurity

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026