REVEALING INTRINSIC DIMENSIONALITY PATTERNS IN SEMANTIC SPACES OF NATURAL LANGUAGES USING GRAPH ALGORITHMS
Yerbolova A. Kurmashev I.
2026Technology Center
Eastern-European Journal of Enterprise Technologies
2026#1Issue 268 - 76 pp.
This study considers semantic spaces of n-grams (unigrams, bigrams, and trigrams) formed from natural language texts. The problem under consideration is related to the limitations of conventional approaches, which use semantic spaces of a fixed high dimensionality without taking into account their internal geometric structure. An experimental study of the internal dimensionality of vector representations of linguistic objects used in natural language processing tasks was conducted. To solve the set task, graph algorithms for estimating internal dimension were applied. These algorithms are based on the analysis of minimum spanning tree statistics, allowing for estimates of both Hausdorff and topological dimensionalities. The experimental studies were conducted on corpora from national literatures in six languages – Russian, English, Kazakh, Kyrgyz, Tatar, and Uzbek – belonging to different typological groups. Vector representations of n-grams were formed using singular value decomposition of the context matrix, which allowed the dimensionality of embedding spaces to be varied without retraining the models. The results revealed consistent differences in the intrinsic dimensionalities of semantic spaces of the studied languages and confirmed their multifractal nature. Interpretation of the findings suggests that the identified differences are due to the typological and structural features of the languages. The obtained estimates are robust to noise and changes in the dimensionality of the embedding space, ensuring the reproducibility of the results. The practical significance of this work relates to the possibility of using intrinsic dimensionality as an engineering parameter in the design and optimization of natural language processing systems to reduce computational and resource costs Copyright
fractal structure , graph algorithms , intrinsic dimensionality , semantic spaces , vector representations
Text of the article Перейти на текст статьи
Department of Information and Communication Technologies Manash Kozybayev North Kazakhstan University, Pushkin str., 86, Petropavlovsk, 150000, Kazakhstan
Department of Information and Communication Technologies Manash Kozybayev North Kazakhstan University, Pushkin str., 86, Petropavlovsk, 150000, Kazakhstan
Department of Information and Communication Technologies Manash Kozybayev North Kazakhstan University
Department of Information and Communication Technologies Manash Kozybayev North Kazakhstan University
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026