Handwritten Kazakh and Russian (HKR) database for text recognition
Nurseitov D. Bostanbekov K. Kurmankhojayev D. Alimova A. Abdallah A. Tolegenov R.
September 2021Springer
Multimedia Tools and Applications
2021#80Issue 21-2333075 - 33097 pp.
In this paper, we introduce a large scale dataset, called HKR, to address challenging detection and recognition problems of handwritten Russian and Kazakh text in the scanned documents. We present a new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition. A few pre-processing and segmentation procedures have been developed together with the database. The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by LaTeXwhich subsequently was filled out by persons with their handwriting. The database consists of more than 1500 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 different writers. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning. For experiments, we used several popular text recognition methods for word and line recognition like CTC-based and attention-based methods. The results indicate the diversity of HKR. The dataset is available at https://github.com/abdoelsayed2016/HKR_Dataset.
Benchmark dataset , Document analysis and recognition , Handwritten Russian and Kazakh text recognition
Text of the article Перейти на текст статьи
KazMunayGas Engineering LLP, Nur-Sultan, Kazakhstan
Hong Kong Polytechnic University, Hung Hom, Hong Kong
Satbayev University Almaty, Almaty, Kazakhstan
KazMunayGas Engineering LLP
Hong Kong Polytechnic University
Satbayev University Almaty
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026