Complete Kazakh handwritten page recognition using start, follow and read method


Jantayev R. Kadyrov S. Amirgaliyev Y.
15 July 2021Little Lion Scientific

Journal of Theoretical and Applied Information Technology
2021#99Issue 133133 - 3143 pp.

In this article we consider end-to-end full page Handwritten Text Recognition for offline Kazakh text images written in Cyrillic alphabet using Fully connected CNN and bidirectional LSTM. The model performs training of text segmentation and recognition jointly using a new Kazakh text images dataset, named Kazakh Handwritten Dataset (KHD). The novel method, which we introduce, uses three steps: Start, Follow and Read (SFR). The proposed model makes use of Region Proposal Network in order to find the starting coordinates of lines in the page. For the case when lines are not straight, we introduce a method that pursues text lines until the end of it and prepare it for the last recognition step. The SFR model works for Russian language as well since Russian alphabet is a subset of Kazakh alphabet. The experimental analysis shows that on average the model provides 0.11 Character Error Rate.

Bidirectional LSTM , CNN , Computer Vision , Document Processing , HTR , Kazakh Handwritten , Text line cutting , Text Line Follower

Text of the article Перейти на текст статьи

Computer Sciences Department, Suleyman Demirel University, Kazakhstan
Mathematics and Natural Sciences Department, Suleyman Demirel University, Kazakhstan
Al-Farabi Kazakh National University, Kazakhstan

Computer Sciences Department
Mathematics and Natural Sciences Department
Al-Farabi Kazakh National University

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026