End-to-End Multi-Modal Speaker Change Detection with Pre-Trained Models


Toleu A. Tolegen G. Pak A. Assel J. Zhumazhanov B.
April 2025Multidisciplinary Digital Publishing Institute (MDPI)

Applied Sciences (Switzerland)
2025#15Issue 8

In this work, we propose a multi-modal speaker change detection (SCD) approach with focal loss, which integrates both audio and text features to enhance detection performance. The proposed approach utilizes pre-trained large-scale models for feature extraction and incorporates a self-attention mechanism to optimize useful features related to speaker change. The extracted features are fused and processed through a fully connected classification network, with layer normalization and dropout for stability and generalization. To address class imbalance, we apply focal loss, which reduces errors for the difficult samples, leading to better balanced performance. Extensive experiments on a multi-talker meeting dataset demonstrate that the proposed multi-modal approach consistently outperforms single-modal models, proving the complementary nature of audio and text for SCD. Fine-tuning pre-trained models (Wav2Vec2 and Bert) for audio and text significantly boosts accuracy, achieving a 21% improvement over frozen models. The self-attention mechanism further improves performance by 2%, highlighting its ability to capture speaker transition cues effectively. Additionally, focal loss enhances the model’s performance, making it more robust to imbalanced data.

multi-modal , pre-trained model , speaker change detection

Text of the article Перейти на текст статьи

Institute of Information and Computational Technologies, Almaty, 050010, Kazakhstan
AI Research Laboratory, Satbayev University, Almaty, 050040, Kazakhstan
School of Information Technology and Engineering, Kazakh-British Technical University, Almaty, 050000, Kazakhstan

Institute of Information and Computational Technologies
AI Research Laboratory
School of Information Technology and Engineering

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026