Optimizing Sentiment Integration in Image Captioning Using Transformer-Based Fusion Strategies


Narejo K.R. Zan H. Dharmani K.P. Mamyrbayev O. Akhmediyarova A. Alibiyeva Z. Alimkulova J.
2025Tech Science Press

Computers, Materials and Continua
2025#84Issue 23407 - 3429 pp.

While automatic image captioning systems have made notable progress in the past few years, generating captions that fully convey sentiment remains a considerable challenge. Although existing models achieve strong performance in visual recognition and factual description, they often fail to account for the emotional context that is naturally present in human-generated captions. To address this gap, we propose the Sentiment-Driven Caption Generator (SDCG), which combines transformer-based visual and textual processing with multi-level fusion. RoBERTa is used for extracting sentiment from textual input, while visual features are handled by the Vision Transformer (ViT). These features are fused using several fusion approaches, including Concatenation, Attention, Visual-Sentiment Co-Attention (VSCA), and Cross-Attention. Our experiments demonstrate that SDCG significantly outperforms baseline models such as the Generalized Image Transformer (GIT), which achieves 82.01%, and Bootstrapping Language-Image Pre-training (BLIP), which achieves 83.07%, in sentiment accuracy. While SDCG achieves 94.52% sentiment accuracy and improves scores in BLEU and ROUGE-L, the model demonstrates clear advantages. More importantly, the captions are more natural, as they incorporate emotional cues and contextual awareness, making them resemble those written by a human. Copyright

deep learning , fusion methods , Image-captioning , sentiment analysis

Text of the article Перейти на текст статьи

School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, China
School of Computing, National University of Computer and Emerging Sciences, Islamabad, 04403, Pakistan
Institute of Information and Computational Technologies, Almaty, 050010, Kazakhstan
Institute of Automation and Information Technologies, Satbayev University, Almaty, 050013, Kazakhstan
Turan University, Chaikina St 12a, Almaty, 050020, Kazakhstan

School of Computer and Artificial Intelligence
School of Computing
Institute of Information and Computational Technologies
Institute of Automation and Information Technologies
Turan University

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026