Topic-Aware Sentiment Analysis of News Articles
Akhmetov I. Gelbukh A. Mussabayev R.
2022Instituto Politecnico Nacional
Computacion y Sistemas
2022#26Issue 1423 - 439 pp.
We consider the problem of sentiment analysis in news media articles cast as a three-way classification task: negative, positive, or neutral. We show that subdividing the training corpus by topic (local news, sports, hi-tech, and others) and training separate sentiment classifiers for each sub-corpus improves classification F1 scores. We use topics since some words carry different sentiments in different domains: e.g., the word force is typically positive in the sports domain but negative in the political domain. Our experiments on the Kaggle dataset with sentiment-labeled Kazakhstani news articles in Russian language using the Convolutional Neural Network (CNN) model partially proved our hypothesis, showing that for the most prominent kz (local news) topic, we achieve an F1 score of 0.70, which is greater than the baseline model trained without the topic-awareness showing just 0.67. Topic-aware improves F1 scores in some topics, but due to the topic/class imbalance further research is needed. However, the performance in terms of F1 over all the corpus does not improve or the improvements are very small. Moreover, our approach shows better results on topics with many text samples than those with relatively small amounts of articles.
Mass media , natural language processing , news articles , sentiment analysis
Text of the article Перейти на текст статьи
Institute of Information and Computational Technologies (IICT), Kazakhstan
Kazakh-British Technical University (KBTU), FIT, Kazakhstan
Instituto Politecnico Nacional (IPN), Centro de Investigacion en Computacion, Mexico
Institute of Information and Computational Technologies (IICT)
Kazakh-British Technical University (KBTU)
Instituto Politecnico Nacional (IPN)
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026