Fine-Grained Image Recognition by Means of Integrating Transformer Encoder Blocks in a Robust Single-Stage Object Detector


Ali U. Oh S. Um T.-W. Hann M. Kim J.
July 2023Multidisciplinary Digital Publishing Institute (MDPI)

Applied Sciences (Switzerland)
2023#13Issue 13

Fine-grained image classification remains an ongoing challenge in the computer vision field, which is particularly intended to identify objects within sub-categories. It is a difficult task since there is both minimal and substantial intra-class variance. Current methods address the issue through first locating selective regions with region proposal networks (RPNs), object localization, or part localization, followed by implementing a CNN network or SVM classifier to those selective regions. This approach, however, makes the process simple via implementing a single-stage end-to-end feature encoded with a localization method, which leads to improved feature representations of individual tokens/regions through integrating the transformer encoder blocks into the Yolov5 backbone structure. These transformer encoder blocks, with their self-attention mechanism, effectively capture global dependencies and enable the model to learn relationships between distant regions. This improves the model’s ability to understand context and capture long-range spatial relationships in an image. We also replaced the Yolov5 detection heads with three transformer heads at the output for object recognition using the discriminative and informative feature maps from transformer encoder blocks. We established the potential of the single-stage detector for the fine-grained image recognition task, achieving state-of-the-art 93.4% accuracy, as well as outperforming existing one-stage recognition models. The effectiveness of our approach is assessed using the Stanford car dataset, which includes 16,185 images of 196 different classes of vehicles with significantly identical visual appearances.

attention mechanism , fine-grained image recognition , transformer encoder block , Yolov5

Text of the article Перейти на текст статьи

ICT Convergence System Engineering Department, Chonnam National University, Gwangju, 61186, South Korea
Graduate School of Data Science, Chonnam National University, Gwangju, 61186, South Korea
Astana IT University, Astana, 010000, Kazakhstan

ICT Convergence System Engineering Department
Graduate School of Data Science
Astana IT University

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026