Deep neural networks for speech enhancement and speech recognition: A systematic review

Natarajan S. Rahman Al-Haddad S.A. Ahmad F.A. Kamil R. Hassan M.K. Azrad S. Macleans J.F. Abdulhussain S.H. Mahmmod B.M. Saparkhojayev N. Dauitbayeva A.
July 2025 Ain Shams University

Ain Shams Engineering Journal
2025 #16 Issue 7

The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.

Acoustic modeling , Beamforming , Deep neural network , Denoising , Machine learning , Reverberation , Speech enhancement , Speech recognition , Systematic review

Text of the article Перейти на текст статьи

Department of Computer and Communication Systems Engineering, Universiti Putra Malaysia, Selangor, Serdang, 43400, Malaysia
Department of Electrical and Electronic Engineering, Universiti Putra Malaysia, Selangor, Serdang, 43400, Malaysia
Department of Aerospace Engineering, Universiti Putra Malaysia, Selangor, Serdang, 43400, Malaysia
Malaysia
Department of Computer Engineering, University of Baghdad, Iraq
Rudny Industrial University, Kazakhstan
Department of Computer Science, Korkyt Ata Kyzylorda State University, Kazakhstan

Department of Computer and Communication Systems Engineering
Department of Electrical and Electronic Engineering
Department of Aerospace Engineering
Malaysia
Department of Computer Engineering
Rudny Industrial University
Department of Computer Science

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026