Exploring Autoencoder-based Representations for Tabular Data Classification

Tokhtakhunov I. Nurtas M. Neftissov A. Pirnaev S. Kazambayev I. Kirichenko L.
October 2025 Engineered Science Publisher

Engineered Science
2025 #37

Autoencoders are evaluated as a means of constructing compact and informative vector representations for classification tasks involving high-dimensional tabular data. The methodology addresses the limitations of traditional models that rely on manual feature engineering and task-specific training. Emphasis is placed on building a generalized look-alike model for targeted advertising, using embeddings derived from subscriber-related entities. The approach is assessed on a real-world telecommunications dataset comprising subscriber demographics, devices, tariffs, and network characteristics. Experimental results demonstrate that embeddings produced by autoencoders outperform classical dimensionality reduction methods such as Principal Component Analysis (PCA), both in predictive quality and computational efficiency. Compressed representations enable the identification of nonlinear patterns and semantic similarities, improving classification accuracy across multiple metrics. The study further introduces an integrated vector architecture by concatenating embeddings from heterogeneous entities. Cosine similarity is employed as a metric for identifying similar users, enabling the development of a scalable and automated recommendation service for Business-to-Business (B2B) applications. Performance is benchmarked using traditional quality metrics (precision, recall, Harmonic Mean of Precision and Recall (F1-score), Receiver Operating Characteristic – Area Under the Curve (ROC AUC)) as well as business-specific indicators such as conversion rate and lift. The findings support the applicability of autoencoders in modeling complex tabular structures with minimal information loss. Prospects include the development of domain-specific autoencoder ensembles and the exploration of alternative vector similarity metrics for broader industrial adoption. The suggested solution can be applied for water resource monitoring system as improvement for classification and further prediction.

Autoencoder , Cosine similarity distance , Embedding , Look-a-Like model

Text of the article Перейти на текст статьи

Department of Mathematical and Computer Modelling, International Information Technology University, 34/1 Manas street,, Almaty, 05000, Kazakhstan
School of Digital Technologies, Narxoz University, 55 Zhandosov street,, Almaty, 050035, Kazakhstan
Faculty of Information technology, Al-Farabi Kazakh National University, 71 Al-Farabi Avenue,, Almaty, 050040, Kazakhstan
Science Innovation Center Industry 4.0, Astana IT University, Mangilik El C1,, Astana, 010000, Kazakhstan
Academy of Physical Education and Mass Sports, Mangilik El B2.2,, Astana, 010000, Kazakhstan
Department of Engineering Technological Machines, Tashkent State Transport University, 1 Temiryolchilar street, Mirabad district,, Tashkent, 100167, Uzbekistan

Department of Mathematical and Computer Modelling
School of Digital Technologies
Faculty of Information technology
Science Innovation Center Industry 4.0
Academy of Physical Education and Mass Sports
Department of Engineering Technological Machines

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026