Multilingual Large Language Models in the Legal Domain: An Effectiveness Analysis in Kazakh, Turkish and English
Sarsenbayeva A. Rakhimova D. Turarbek A. Adali E.
2025Institute of Electrical and Electronics Engineers Inc.
International Conference on Computer Science and Engineering, UBMK
2025Issue 2025410 - 415 pp.
This study presents a comparative evaluation of multilingual large language models in the legal domain, focusing on their performance in Kazakh, Turkish, and English. A novel legal benchmark was developed, consisting of 1,200 frequently asked questions related to the legislation of the Republic of Kazakhstan. These questions were translated into Turkish and English and manually verified to ensure legal consistency and semantic equivalence. Four prominent LLMs - GPT-4, Gemini 1.5 Pro, LLaMA 2, and AYA - were evaluated in a zero-shot setting using classification and generative tasks. The evaluation employed Accuracy, F1 score, Jaccard index, and ROUGE metrics to assess lexical and semantic quality. Results showed that GPT-4 led across most metrics, especially in English, while AYA demonstrated competitive results in Kazakh generative tasks. LLaMA 2 yielded the weakest performance. This study contributes to the field by introducing a trilingual legal benchmark and a fine-grained evaluation of large language models behavior on underrepresented legal queries.
AYA , Gemini , GPT , Kazakh language , legal domain , Llama , low-resource languages , multilingual large language models , question answer system
Text of the article Перейти на текст статьи
Al Farabi Kazakh National University, Almaty, Kazakhstan
İstanbul Teknik Üniversitesi, Istanbul, Turkey
Al Farabi Kazakh National University
İstanbul Teknik Üniversitesi
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026