LLM-Enhanced Semantic Text Segmentation
Krassovitskiy A. Mussabayev R. Yakunin K.
October 2025Multidisciplinary Digital Publishing Institute (MDPI)
Applied Sciences (Switzerland)
2025#15Issue 19
This study investigates semantic text segmentation enhanced by large language model (LLM) embeddings. We assess how effectively embeddings capture semantic coherence and topic closure by integrating them into both classical clustering algorithms and a modified graph-based methods. In addition, we propose a simple magnetic clustering algorithm as a lightweight baseline. Experiments are conducted across multiple datasets and embedding models, with segmentation quality evaluated using the boundary segmentation metric. Results demonstrate that LLM embeddings improve segmentation accuracy, highlight dataset-specific difficulties, and reveal how contextual window size and embedding choice affect performance. These findings clarify the strengths and limitations of embedding-based approaches to segmentation and provide insights relevant to retrieval-augmented generation (RAG).
clustering algorithms , embedding , LLMs , machine learning , optimization , RAG , semantic analysis , text segmentation
Text of the article Перейти на текст статьи
Laboratory for Analysis and Modeling of Information Processes, Institute of Information and Computational Technologies, Almaty, 050010, Kazakhstan
AI Research Lab, Satbayev University, Almaty, 050013, Kazakhstan
School of Digital Technologies, Almaty Management University, Almaty, 050060, Kazakhstan
Laboratory for Analysis and Modeling of Information Processes
AI Research Lab
School of Digital Technologies
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026