Enhancing Emoji-Based Sentiment Classification in Urdu Tweets: Fusion Strategies With Multilingual BERT and Emoji Embeddings
Narejo K.R. Zan H. Oralbekova D. Dharmani K.P. Orken M. Mukhsina K.
2024Institute of Electrical and Electronics Engineers Inc.
IEEE Access
2024#12126587 - 126600 pp.
X (formerly known as Twitter) is a popular social network with hundreds of millions of users. We emphasize the benefits of using emojis to enhance the comprehension of user sentiment. Our objective was to analyze the sentiments expressed in Urdu language tweets, a task that can be demanding due to the languages intricate structure and diverse dialects. Our research revolves around combining emoji embeddings with the SentiUrdu-1M dataset, consisting of 1.14 million Urdu tweets and 1,194 emojis, using multilingual BERT (mBERT). The major motive of our study is twofold: 1) to evaluate the performance of pre-trained emoji2vec and our proposed method of Urdu-Specific FastText emoji embeddings in terms of their ability to distinguish emojis based on their expressions; and 2) to explore techniques for integrating Urdu tweets and emoji embeddings, including concatenation, neural network fusion, and attention mechanism fusion. Moreover, we fine-tuned the baseline models on only-text Urdu tweets using multilingual BERT and XLM-RoBERTa, achieving accuracies of 64% and 65%, respectively. Therefore, our study fills a gap in the literature by investigating the possibility of enhancing sentiment analysis in Urdu language tweets through emojis, a field that has received limited attention. The Urdu-Specific FastText emoji embeddings proposed in this paper yield better results than the pre-trained emojis from emoji2vec and improve sentiment analysis accuracy up to 95% for the neural network fusion approach.
emoji embeddings , emojis , fine-tuning , multilingual BERT , sentiment analysis , Urdu tweets , XLM-RoBERTa
Text of the article Перейти на текст статьи
Zhengzhou University, School of Computer and Artificial Intelligence, Zhengzhou, 450001, China
Institute of Information and Computational Technologies, Almaty, 050060, Kazakhstan
National University of Computer and Emerging Sciences, School of Computing, Islamabad, 04403, Pakistan
Zhengzhou University
Institute of Information and Computational Technologies
National University of Computer and Emerging Sciences
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026