CREATING A MODEL OF SEMANTIC ANALYSIS OF EXTREMIST TEXTS IN THE KAZAKH LANGUAGE

Mussiraliyeva S.Zh. Bolatbek M.A. Zhumakhanova A.N. Baispay G. Medetbek Z.
9 April 2024 al-Farabi Kazakh State National University

KazNU Bulletin. Mathematics, Mechanics, Computer Science Series
2024 #121 Issue 1 110 - 121 pp.

Presently, there is a significant emphasis on the utilization of semantic analysis to scrutinize texts and viewpoints expressed in the Kazakh language within the realm of social networks, with the primary objective of identifying content of a suspicious or extremist nature. This research article is dedicated to exploring the application of machine learning and deep learning techniques in the realm of extremist content detection within textual data. The investigation takes into account several critical factors, including oversampling and under sampling during the feature processing phase, the nuanced differentiation between extremist and neutral subjects, and the handling of imbalanced classification challenges. These considerations culminate in the development of a sophisticated deep learning model for text classification. The study encompasses the deployment of various machine learning models to discern extremist content within textual materials. Additionally, a comprehensive comparative analysis of machine learning methodologies is conducted to ascertain the most effective approach for this task, taking into consideration oversampling and under sampling techniques for addressing data imbalance issues. The research endeavours are delineated into two core subtasks: the formulation of a machine learning model specialized in the detection of extremist content within text, and the construction of a deep learning model that factors in the unique characteristics of the Kazakh language and the available dataset. Furthermore, the study delves into the intricacies of feature processing, culminating in a comparative assessment of outcomes derived from a range of machine learning algorithms used to classify religious extremism, each leveraging distinct feature combinations. The methodologies explored encompass decision trees, random forests, support vector machines, k-nearest neighbours, logistic regression, and naive Bayes. This research significantly contributes to the spheres of text mining, artificial intelligence, and machine learning, offering practical recommendations for the processing and categorization of texts linked to religious extremism. Moreover, it underscores the contemporary significance of conducting semantic analyses on extremist texts written in the Kazakh language.

deep learning , internet extremism , machine learning , neural networks , social networks

Text of the article Перейти на текст статьи

Al-Farabi Kazakh National University, Almaty, Kazakhstan

Al-Farabi Kazakh National University

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026