Automated Classification of Public Transport Complaints via Text Mining Using LLMs and Embeddings


Rakhimzhanov D. Belginova S. Yedilkhan D.
August 2025Multidisciplinary Digital Publishing Institute (MDPI)

Information (Switzerland)
2025#16Issue 8

The proliferation of digital public service platforms and the expansion of e-government initiatives have significantly increased the volume and diversity of citizen-generated feedback. This trend emphasizes the need for classification systems that are not only tailored to specific administrative domains but also robust to the linguistic, contextual, and structural variability inherent in user-submitted content. This study investigates the comparative effectiveness of large language models (LLMs) alongside instruction-tuned embedding models in the task of categorizing public transportation complaints. LLMs were tested using a few-shot inference, where classification is guided by a small set of in-context examples. Embedding models were assessed under three paradigms: label-only zero-shot classification, instruction-based classification, and supervised fine-tuning. Results indicate that fine-tuned embeddings can achieve or exceed the accuracy of LLMs, reaching up to 90 percent, while offering significant reductions in inference latency and computational overhead. E5 embeddings showed consistent generalization across unseen categories and input shifts, whereas BGE-M3 demonstrated measurable gains when adapted to task-specific distributions. Instruction-based classification produced lower accuracy for both models, highlighting the limitations of prompt conditioning in isolation. These findings position multilingual embedding models as a viable alternative to LLMs for classification at scale in data-intensive public sector environments.

embedding models , few-shot inference , instruction-based classification , large language models , multilingual complaint classification , public sector NLP , public transportation , resource-efficient NLP , supervised fine-tuning , zero-shot learning

Text of the article Перейти на текст статьи

Big Data and Blockchain Technologies Research Innovation Center, Astana IT University, Astana, 020000, Kazakhstan
Department of Information Technology, University Turan, Almaty, 050013, Kazakhstan

Big Data and Blockchain Technologies Research Innovation Center
Department of Information Technology

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026