Multimodal Person Verification with Generative Thermal Data Augmentation

Abdrakhmanova M. Unaspekov T. Varol H.A.
1 January 2024 Institute of Electrical and Electronics Engineers Inc.

IEEE Transactions on Biometrics, Behavior, and Identity Science
2024 #6 Issue 1 43 - 53 pp.

The fusion of audio, visual, and thermal modalities has proven effective in developing reliable person verification systems. In this study, we enhanced multimodal person verification performance by augmenting training data using domain transfer methods. Specifically, we enriched the audio-visual-thermal SpeakingFaces dataset with a combination of real audio-visual data and synthetic thermal data from the VoxCeleb dataset. We adapted visual images in VoxCeleb to the thermal domain using CycleGAN, trained on SpeakingFaces. Our results demonstrate the positive impact of augmented training data on all unimodal and multimodal models. The score fusion of unimodal audio, unimodal visual, bimodal, and trimodal systems trained on the combined data achieved the best results on both datasets and exhibited robustness in low-illumination and noisy conditions. Our findings emphasize the importance of utilizing synthetic data, produced by generative methods, to improve deep learning model performance. To facilitate reproducibility and further research in multimodal person verification, we have made our code, pretrained models, and preprocessed dataset freely available in our GitHub repository.

data augmentation , Deep learning , face synthesis , generative adversarial networks , multimodal fusion , multimodal learning , person verification

Text of the article Перейти на текст статьи

Nazarbayev University, Institute of Smart Systems and Artificial Intelligence, Astana, 010000, Kazakhstan

Nazarbayev University

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026