Kazakhstani HER2 breast cancer digital image dataset: The ADEL dataset


Dunenova G. Sarsembayev A. Ivankov A. Kaidarova D. Kalmatayeva Z. Satbayeva E. Glushkova N.
October 2025Elsevier Inc.

Data in Brief
2025#62

Breast cancer remains a leading cause of cancer-related mortality among women worldwide, with HER2-positive subtypes requiring precise diagnostic approaches to guide targeted therapy. Digital pathology and AI-based tools offer promising solutions, but their development relies heavily on high-quality digital datasets, labelled or annotated. In this study, we present a dataset of digital images of breast cancer tissue samples with immunohistochemical expression of human epidermal growth factor receptor 2 (HER2) classes 0, 1+, 2+, and 3+. Breast cancer tissue samples were formalin-fixed and paraffin-embedded (FFPE), followed by the preparation of paraffin blocks and 5-µm sections. Immunohistochemical staining was performed using a Ventana Benchmark Ultra automated immunostainer with PATHWAY anti-HER2/neu (4B5) rabbit monoclonal antibodies and ULTRA VIEW detection system. Digital images were acquired via a fully automated digital system (KFB PRO 120 scanner) at INVIVO LLP with 40x magnification and one focusing layer, ranging in size from 50 MB to 2 GB, depending on the size of the tissue sample fixed on the original slide. The dataset consists of 418 subfolders with images, each corresponding to a source image and containing a different number of tiles depending on the size of the source image. The original images were preprocessed using a conversion script that transformed SVS files into sub-images with a 1:1 aspect ratio in JPEG format. A non-overlapping sliding window approach was applied to generate these sub-images, optimized for machine learning applications. A square window of 1000 × 1000 pixels was used to crop sub-images with a 1:1 aspect ratio. The stride of the sliding window was set to a value that was a multiple of the image resolution (as determined during preprocessing). As a result, a variable number of sub-images were generated from each original SVS image, depending on its size. The output file format was JPEG. Clinical labeling of the data was provided by reference laboratory pathologists with expertise in advanced oncological morphology evaluations. This dataset allows training and validation of machine learning models for the diagnosis, recognition, and classification of breast cancer using the available labeling, as well as for educational purposes for residents and pathologists.

Breast cancer , Dataset , Digital images , HER2

Text of the article Перейти на текст статьи

Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
School of Digital Technologies, Almaty Management University, Almaty, 050060, Kazakhstan
Almaty, 050000, Kazakhstan
Rector Office, Asfendiyarov Kazakh National Medical University, Almaty, 050000, Kazakhstan
Almaty Oncology Center, Almaty, 050040, Kazakhstan
Health Research Institute, Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan

Al-Farabi Kazakh National University
School of Digital Technologies
Almaty
Rector Office
Almaty Oncology Center
Health Research Institute

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026