Word Embeddings: A Comprehensive Survey


Pak A. Ziyaden A. Saparov T. Akhmetov I. Gelbukh A.
2024Instituto Politecnico Nacional

Computacion y Sistemas
2024#28Issue 42005 - 2029 pp.

This article is a systematic review of available studies in the area of word embeddings with an emphasis on classical matrix factorization techniques and contemporary neural word embedding algorithms such as Word2Vec, GloVe, and Bert. The efficiency and effectiveness of these methods for mapping semantic and lexical relationships are evaluated in greater detail providing analysis of the topology of these techniques. In addition, this approach demonstrates a model accuracy of 77%, which is 3% below the best human performance. At the same time the study has also shown the weaknesses of some models such as BERT, which lead to unrealistic high accuracy due to spurious correlations in the datasets. We see that there are three bottlenecks for the subsequent development of NLP algorithms: assimilation of inductive bias, common sense embedding, and generalization problem. The outcomes from this research help in enhancing the strength and applicability of word embeddings in natural language processing tasks.

deep learning , distributive semantics , Language models , natural language processing , word embeddings

Text of the article Перейти на текст статьи

Institute of Informational and Computational Technologies, Big Data Mining Lab, Almaty, Kazakhstan
Kazakh-British Technical University, Almaty, Kazakhstan
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico

Institute of Informational and Computational Technologies
Kazakh-British Technical University
Instituto Politécnico Nacional

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026