Improving Vision-Language Models with Attention Mechanisms for Aerial Video Classification


Tu N.A. Aikyn N.
2025Institute of Electrical and Electronics Engineers Inc.

IEEE Geoscience and Remote Sensing Letters
2025#22

Vision-language models (VLMs), particularly contrastive language-image pretraining (CLIP), have recently demonstrated great success across various vision tasks. However, their potential in aerial video understanding, an increasingly active area of remote sensing (RS), remains underexplored. This is due to challenges posed by aerial data, such as UAV movement, extreme camera angles, and complex spatiotemporal dependencies. To tackle these challenges, we propose an effective method called CLIP-AVC, which adapts CLIP to classify aerial videos into predefined classes. Specifically, we leverage CLIPs multimodal transferability by utilizing its encoders to extract robust visual and textual features. We then employ a temporal transformer to capture the interactions among the visual features. To address the lack of inductive bias in the CLIPs visual encoder, we integrate the temporal transformers outputs with 3-D features using a cross-transformer, thereby allowing the spatiotemporal locality of aerial videos. In addition, existing methods often fail to explore the semantic alignment between classes and video features. To further overcome these limitations, we propose a context-enriched transformer that employs self-attention mechanisms to adaptively refine visual and textual representations. Experimental results on two benchmark datasets validate the robustness of CLIP-AVC, demonstrating its potential to significantly advance VLMs for aerial scene understanding.

Aerial video classification (AVC) , attention mechanism , deep learning , foundation models , vision-language

Text of the article Перейти на текст статьи

Nazarbayev University, School of Engineering and Digital Sciences, Department of Computer Science, Astana, 010000, Kazakhstan

Nazarbayev University

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026