THE DEPENDENCE OF THE EFFECTIVENESS OF NEURAL NETWORKS FOR RECOGNIZING HUMAN VOICE ON LANGUAGE

Nurlankyzy A. Akhmediyarova A. Zhetpisbayeva A. Namazbayev T. Yskak A. Yerzhan N. Medetov B.
2024 Technology Center

Eastern-European Journal of Enterprise Technologies
2024 #1 Issue 9(127)72 - 81 pp.

This study examines the effectiveness of neural network architectures (multilayer perceptron MLP, convolutional neural network CNN, recurrent neural network RNN) for human voice recognition, with an emphasis on the Kazakh language. Problems related to language, the difference between speakers, and the influence of network architecture on recognition accuracy are considered. The methodology includes extensive training and testing, studying the accuracy of recognition in different languages, and different sets of data on speakers. Using a comparative analysis, this study evaluates the performance of three architectures trained exclusively in the Kazakh language. The testing included statements in Kazakhs and other languages, while the number of speakers varied to assess its impact on recognition accuracy. During the study, the results showed that CNN neural networks are more effective in recognizing human voice than RNN and MLP. Also, it was found that the CNN has a higher accuracy in recognizing the human voice in the Kazakh language, both for a small and for a large number of announcers. For example, for 20 speakers, the recognition error in Russian was 21.86 %, whereas in Kazakhs it was 10.6 %. A similar trend was observed for 80 speakers: 16.2 % Russians and 8.3 % Kazakhs. It can also be argued that learning one language does not guarantee high recognition accuracy in other languages. Therefore, the accuracy of human voice recognition by neural networks depends significantly on the language in which training is conducted. In addition, this study highlights the importance of different sets of speaker data to achieve optimal results. This knowledge is crucial for advancing the development of reliable human voice recognition systems that can accurately identify different human voices in different language contexts

Artificial intelligence , CNN , human voice recognition , language specifics , MLP , neural networks , recognition accuracy , RNN , the effectiveness of training , voice activity detector

Text of the article Перейти на текст статьи

Almaty University of Power Engineering and Telecommunications, Baytursynuli str., 126/1, Almaty, 050013, Kazakhstan
Department of Solid State Physics and Nonlinear Physics, Al-Farabi Kazakh National University, Al-Farabi ave., 71, Almaty, 050040, Kazakhstan
Department of Cybersecurity, Information Processing and Storage, Kazakhstan
Department of Software Engineering, Kazakhstan
Institute of Automation and Information Technology, Satbayev University, Satpaev str., 22a, Almaty, 050013, Kazakhstan
Department of Radio Engineering, Electronics and Telecommunications, S. Seifullin Kazakh Agro Technical Research University, Zhenis ave., 62, Astana, 010011, Kazakhstan

Almaty University of Power Engineering and Telecommunications
Department of Solid State Physics and Nonlinear Physics
Department of Cybersecurity
Department of Software Engineering
Institute of Automation and Information Technology
Department of Radio Engineering

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026