DEVELOPMENT AND INCREASE OF NOISE IMMUNITY OF A MODEL OF BIOMETRIC IDENTIFICATION OF A SPEAKER BASED ON METAL-FREQUENCY CEPSTRAL COEFFICIENTS AND A CONVOLUTIONAL NEURAL NETWORK


Khizirova M. Chezhimbayeva K. Kassimov A. Yermekbaev M. Iskakova A.
2025Technology Center

Eastern-European Journal of Enterprise Technologies
2025#6Issue 937 - 53 pp.

This study is focused on improving the noise robustness of a biometric speaker identification system based on mel-frequency cepstral coefficients (MFCC) and a convolutional neural network (CNN). The object of analysis is the acoustic structure of the Kazakh language under clean and noisy conditions. The experimental database consisted of 16 speakers, each represented by 12 audio recordings with a duration of approximately 1 s. The speech signals were corrupted by additive pink noise with different signal-to-noise ratio (SNR) levels. Under clean signal conditions, the CNN-based classifier achieved a high recognition accuracy of approximately 96%, as confirmed by the confusion matrix with strong diagonal dominance. When exposed to noise, the classification accuracy decreased to about 69%, demonstrating the significant impact of acoustic interference on speaker identification performance. To improve noise immunity, noise augmentation was applied during training. After retraining on the augmented dataset, the classification accuracy under noisy conditions increased to approximately 89–90%. The heatmaps of precision, recall, and F1-score demonstrate that after robustness enhancement, most speaker classes achieve stable metric values in the range of 0.85–1.00, while the averaged performance metrics reach accuracy ≈ 0.89–0.90, confirming consistent recognition across the entire dataset. The results show that MFCC features retain discriminative speaker-specific spectral characteristics even under noise and that CNN-based classification significantly outperforms traditional approaches in terms of robustness. The proposed MFCC–CNN approach provides high identification accuracy in clean environments and maintains competitive performance under noise after data augmentation. The obtained results confirm the practical applicability of the developed system for reliable speaker verification in acoustically unstable environments, including remote biometric authentication, access control, and intelligent communication systems. Copyright

Kazakh speech , mel-frequency cepstral coefficients , noise , speaker identification , voice biometrics

Text of the article Перейти на текст статьи

Department of Telecommunications Engineering Almaty University of Power Engineering and Telecommunications named after Gumarbek, Daukeyev Baitursynuly str., 126/1, Almaty, 050013, Kazakhstan

Department of Telecommunications Engineering Almaty University of Power Engineering and Telecommunications named after Gumarbek

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026