Explainable AI for Coronary Artery Disease Stratification Using Routine Clinical Data
Tasmurzayev N. Imanbek B. Boltaboyeva A. Dikhanbayeva G. Zhussupbekov S. Saparbayeva Q. Amirkhanova G.
November 2025Multidisciplinary Digital Publishing Institute (MDPI)
Algorithms
2025#18Issue 11
Background: Coronary artery disease (CAD) remains a leading cause of morbidity and mortality. Early diagnosis reduces adverse outcomes and alleviates the burden on healthcare, yet conventional approaches are often invasive, costly, and not always available. In this context, machine learning offers promising solutions. Objective: The objective of this study is to evaluate the feasibility of reliably predicting both the presence and the severity of CAD. The analysis is based on a harmonized, multi-center UCI dataset that includes cohorts from Cleveland, Hungary, Switzerland, and Long Beach. The work aims to assess the accuracy and practical utility of models built exclusively on routine tabular clinical and demographic data, without relying on imaging. These models are designed to improve risk stratification and guide patient routing. Methods and Results: The study is based on a uniform and standardized data processing pipeline. This pipeline includes handling missing values, feature encoding, scaling, an 80/20 train–test split and applying the SMOTE method exclusively to the training set to prevent information leakage. Within this pipeline, a standardized comparison of a wide range of models (including gradient boosting, tree-based ensembles, support vector methods, etc.) was conducted with hyperparameter tuning via GridSearchCV. The best results were demonstrated by the CatBoost model: accuracy—0.8278, recall—0.8407, and F1-score—0.8436. Conclusions: A key distinction of this work is the comprehensive evaluation of the models’ practical suitability. Beyond standard metrics, the analysis of calibration curves confirmed the reliability of the probabilistic predictions. Patient-level interpretability using SHAP showed that the model relies on clinically significant predictors, including ST-segment depression. Calibrated and explainable models based on readily available data are positioned as a practical tool for scalable risk stratification and decision support, especially in resource-constrained settings.
CatBoost , coronary artery disease , CVD , machine learning , risk prediction , ROC–AUC
Text of the article Перейти на текст статьи
Faculty of Information Technologies and Artificial Intelligence, Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
LLP “Kazakhstan R&D Solutions”, Almaty, 050056, Kazakhstan
Faculty of Postgraduate Higher Medical Education, Akhmet Yasawi University, Shymkent, 161200, Kazakhstan
Department of Automation and Control, Energo University, Almaty, 050013, Kazakhstan
Faculty of Pholology, South Kazakhstan University Named After O.Zhanibekov, Shymkent, 160012, Kazakhstan
Faculty of Information Technologies and Artificial Intelligence
LLP “Kazakhstan R&D Solutions”
Faculty of Postgraduate Higher Medical Education
Department of Automation and Control
Faculty of Pholology
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026