LLM-Enhanced Semantic Text Segmentation

Krassovitskiy A. Mussabayev R. Yakunin K.
October 2025 Multidisciplinary Digital Publishing Institute (MDPI)

Applied Sciences (Switzerland)
2025 #15 Issue 19

This study investigates semantic text segmentation enhanced by large language model (LLM) embeddings. We assess how effectively embeddings capture semantic coherence and topic closure by integrating them into both classical clustering algorithms and a modified graph-based methods. In addition, we propose a simple magnetic clustering algorithm as a lightweight baseline. Experiments are conducted across multiple datasets and embedding models, with segmentation quality evaluated using the boundary segmentation metric. Results demonstrate that LLM embeddings improve segmentation accuracy, highlight dataset-specific difficulties, and reveal how contextual window size and embedding choice affect performance. These findings clarify the strengths and limitations of embedding-based approaches to segmentation and provide insights relevant to retrieval-augmented generation (RAG).

clustering algorithms , embedding , LLMs , machine learning , optimization , RAG , semantic analysis , text segmentation

Text of the article Перейти на текст статьи

Laboratory for Analysis and Modeling of Information Processes, Institute of Information and Computational Technologies, Almaty, 050010, Kazakhstan
AI Research Lab, Satbayev University, Almaty, 050013, Kazakhstan
School of Digital Technologies, Almaty Management University, Almaty, 050060, Kazakhstan

Laboratory for Analysis and Modeling of Information Processes
AI Research Lab
School of Digital Technologies

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026