Complete Kazakh handwritten page recognition using start, follow and read method
Jantayev R. Kadyrov S. Amirgaliyev Y.
15 July 2021Little Lion Scientific
Journal of Theoretical and Applied Information Technology
2021#99Issue 133133 - 3143 pp.
In this article we consider end-to-end full page Handwritten Text Recognition for offline Kazakh text images written in Cyrillic alphabet using Fully connected CNN and bidirectional LSTM. The model performs training of text segmentation and recognition jointly using a new Kazakh text images dataset, named Kazakh Handwritten Dataset (KHD). The novel method, which we introduce, uses three steps: Start, Follow and Read (SFR). The proposed model makes use of Region Proposal Network in order to find the starting coordinates of lines in the page. For the case when lines are not straight, we introduce a method that pursues text lines until the end of it and prepare it for the last recognition step. The SFR model works for Russian language as well since Russian alphabet is a subset of Kazakh alphabet. The experimental analysis shows that on average the model provides 0.11 Character Error Rate.
Bidirectional LSTM , CNN , Computer Vision , Document Processing , HTR , Kazakh Handwritten , Text line cutting , Text Line Follower
Text of the article Перейти на текст статьи
Computer Sciences Department, Suleyman Demirel University, Kazakhstan
Mathematics and Natural Sciences Department, Suleyman Demirel University, Kazakhstan
Al-Farabi Kazakh National University, Kazakhstan
Computer Sciences Department
Mathematics and Natural Sciences Department
Al-Farabi Kazakh National University
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026