Enhancing Neural Machine Translation with Fine-Tuned mBART50 Pre-Trained Model: An Examination with Low-Resource Translation Pairs
Kozhirbayev Z.
June 2024International Information and Engineering Technology Association
Ingenierie des Systemes dInformation
2024#29Issue 3831 - 838 pp.
In the realm of natural language processing (NLP), the use of pre-trained models has seen a significant rise in practical applications. These models are initially trained on extensive datasets, encompassing both monolingual and multilingual data, and can be subsequently fine-tuned for target output using a smaller, task-specific dataset. Recent research in multilingual neural machine translation (NMT) has shown potential in creating architectures that can incorporate multiple languages. One such model is mBART50, which was trained on 50 different languages. This paper presents a work on fine-tuning mBART50 for NMT in the absence of high-quality bitext. Adapting a pre-trained multilingual model can be an effective approach to overcome this challenge, but it may not work well when the translation pairs contain languages not seen by the pre-trained model. In this paper, the resilience of the self-supervised multilingual sequence-to-sequence pre-trained model (mBART50) were investigated when fine-tuned with small amounts of high-quality bitext or large amounts of noisy parallel data (Kazakh-Russian). It also shows how mBART improves a neural machine translation system on a low-resource translation pair, where at least one language is unseen by the pre-trained model (Russian-Tatar). The architecture of mBART was employed in this study, adhering to the traditional sequence-to-sequence Transformer design. A Transformer Encoder-Decoder model with Byte Pair Encoding (BPE) was trained in our baseline experiment. The experiments show that fine-tuned mBART models outperform Baseline Transformer-based NMT models in all tested translation pairs, including cases where one language is unseen during mBART pretraining. The results show an increase in the BLEU score of 11.95 when translating from Kazakh to Russian and by 1.17 points in BLEU score when translating from Russian to Tatar. Utilizing pre-trained models like mBART can substantially reduce the data and computational requirements for NMT, leading to improved translation performace for low-resource languages and domains. Copyright:
denoising auto-encoder , fine-tuning , Kazakh-Russian , low-resource languages , neural machine translation , pre-trained models , Russian-Tatar
Text of the article Перейти на текст статьи
National Laboratory Astana, Nazarbayev University, Astana, 010000, Kazakhstan
National Laboratory Astana
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026