Multilingual Large Language Models in the Legal Domain: An Effectiveness Analysis in Kazakh, Turkish and English


Sarsenbayeva A. Rakhimova D. Turarbek A. Adali E.
2025Institute of Electrical and Electronics Engineers Inc.

International Conference on Computer Science and Engineering, UBMK
2025Issue 2025410 - 415 pp.

This study presents a comparative evaluation of multilingual large language models in the legal domain, focusing on their performance in Kazakh, Turkish, and English. A novel legal benchmark was developed, consisting of 1,200 frequently asked questions related to the legislation of the Republic of Kazakhstan. These questions were translated into Turkish and English and manually verified to ensure legal consistency and semantic equivalence. Four prominent LLMs - GPT-4, Gemini 1.5 Pro, LLaMA 2, and AYA - were evaluated in a zero-shot setting using classification and generative tasks. The evaluation employed Accuracy, F1 score, Jaccard index, and ROUGE metrics to assess lexical and semantic quality. Results showed that GPT-4 led across most metrics, especially in English, while AYA demonstrated competitive results in Kazakh generative tasks. LLaMA 2 yielded the weakest performance. This study contributes to the field by introducing a trilingual legal benchmark and a fine-grained evaluation of large language models behavior on underrepresented legal queries.

AYA , Gemini , GPT , Kazakh language , legal domain , Llama , low-resource languages , multilingual large language models , question answer system

Text of the article Перейти на текст статьи

Al Farabi Kazakh National University, Almaty, Kazakhstan
İstanbul Teknik Üniversitesi, Istanbul, Turkey

Al Farabi Kazakh National University
İstanbul Teknik Üniversitesi

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026