Dual-scale adaptive attention-based Vision transformer with iterative refinement for clarity and consistency in multi-focus image fusion

Li G. Tang D. Huang J. Zhu S. Cao J.
1 January 2026 Elsevier Ltd

Engineering Applications of Artificial Intelligence
2026 #163

Multi-focus Image Fusion (MFIF) has become a prominent role in combining focused regions of several source images into a single all-in-focus fused image. However, existing approaches have the limitation of maintaining global spatial coherence and sharp details. To overcome these limitations, the Dual-Scale Adaptive Attention-Based Vision Transformer (DAA-ViT) model is proposed, which integrates fine-scale and coarse-scale attention, with the aim of maintaining local high-resolution information along with structural coherence. Additionally, an Iterative Refinement Fusion (IRF) is introduced to refine focus boundaries through multiple iterations for enhancing overall image definition, while mitigating fusion artifacts and focus selection errors. Especially, this Artificial Intelligence (AI)-based approach is efficient in complex scenes with inconsistent depth levels, which is suitable for applications like remote sensing and medical image processing. Experimental results of several benchmark datasets demonstrate that the proposed method attains better results than existing methods with a Mutual Information (MI) of 8.9671, Structural Similarity Index Measure (SSIM) of 0.9211, Peak Signal-To-Noise Ratio (PSNR) of 36.728 dB, and Lower Root Mean Square Error (RMSE) of 1.5482. Compared to the existing Swin Transformer and Convolutional Neural Network (STCU-Net) model, the proposed model attains 2.65 % improvement in PSNR, 1.99 % improvement in MI, 1.11 % improvement in Structural Similarity Index Measure, and 5.13 % reduction in RMSE. These findings demonstrate the efficiency of AI-based fusion strategies in delivering high-quality all-in-focus images and emphasize their applications in medical imaging and remote sensing processing. Copyright

Artificial Intelligence , Deep learning , Dual-scale adaptive attention , Iterative refinement fusion , Medical Image processing , Multi-focus Image fusion , Remote sensing , Vision transformer

Text of the article Перейти на текст статьи

Art Design College, Henan University of Urban Construction, Henan, Pingdingshan, 467000, China
Faculty of information Technology, Al Farabi Kazakh National University, Almaty, 050040, Kazakhstan
Graduate School of the General Hospital of the Peoples Liberation Army of China, Beijing, China
Department of Respiratory and Critical Care Medicine, The First Medical Center of the General Hospital of the Peoples Liberation Army of China, Beijing, China

Art Design College
Faculty of information Technology
Graduate School of the General Hospital of the Peoples Liberation Army of China
Department of Respiratory and Critical Care Medicine

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026