Introducing Cultural Knowledge in Language Models: KazCulture Dataset for Kazakh Culture


Maxutov A. Arystanbekov B. Makhataeva Z. Yergen A. Taizhanov N. Nauryzbaikyzy G. Varol H.A.
2026Institute of Electrical and Electronics Engineers Inc.

IEEE Access
2026

Although Large Language Models (LLMs) have achieved significant advances in linguistic fluency, they often suffer from a lack of cultural knowledge associated with low-resource languages. This deficiency could challenge their integration into high-stakes applications across diverse regions. In this paper, we present a systematic method for embedding nation-specific cultural knowledge into LLMs, using Kazakh culture and language as a case study. We present KazCulture, a robust Kazakh culture-specific dataset composed of 16,137 human-crafted Passage-Question-Answer (PQA) triplets. KazCulture is rigorously curated from 11 books related to culture and the Koshpendiler.kz digital archive, capturing deep cultural semantics in areas such as customs, traditions, beliefs, cuisine, and household practices. Using KazCulture, we evaluated 36 LLMs, including proprietary frontier models and open-source alternatives. Our benchmarking reveals a critical disparity: while proprietary models like GPT-5 and Gemini-2.5-Pro achieved up to 80% accuracy, open-source models struggled significantly. To bridge this gap, we propose a two-stage adaptation pipeline: 1) perform fine-tuning on a general multilingual dataset (ISSAI-SFT) for linguistic robustness, then 2) run targeted fine-tuning on KazCulture. This method boosted the accuracy of the Qwen3-32B model from a baseline of 39.51% to 64.54%. KazCulture provides a timely contribution to Artificial Intelligence (AI) research as both a rigorous benchmark for Kazakh-culture-related knowledge and a training resource to develop different culture aware LLMs. The dataset is available at https://huggingface.co/datasets/issai/KazCulture.

Culture and tradition-specific dataset , fine-tuning , large language models (LLMs) , LLM benchmark , passage-question-answer (PQA) triplets

Text of the article Перейти на текст статьи

Private Institution “Institute of Smart Systems and Artificial Intelligence” (ISSAI), Astana, Kazakhstan
Al-Farabi Kazakh National University, Almaty, Kazakhstan
Nazarbayev University (NU), Astana, Kazakhstan

Private Institution “Institute of Smart Systems and Artificial Intelligence” (ISSAI)
Al-Farabi Kazakh National University
Nazarbayev University (NU)

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026