Historical and Poetic Subcorpus of the National Kazakh Language Corpus


Историко-поэтический подкорпус Национального корпуса казахского языка
Seitbekova A.A. Fazylzhan A.M. Seidamat A.K. Abayeva M.K. Mursal A.
2025Ch. K. Lamazhaa

New Research of Tuva
2025Issue 2312 - 338 pp.

The article analyzes the key aspects of digitizing the samples of Kazakh oral folk literature from the 15th to 19th centuries, originally written in Arabic script, and their integration into the National Corpus of the Kazakh Language (NCKL). This work constitutes the first stage in the creation of a Historical and Poetic Subcorpus of the NCKL. As part of the study, a comparative analysis of existing poetic subcorpora in other languages (Russian, Czech, Bashkir, and Persian) was conducted, allowing for the identification of the most effective methods and approaches for developing the Kazakh subcorpus. A significant outcome of the project is the development of a metatextual annotation model comprising 28 parameters that consider the specifics of Kazakh poetry. Key elements of Kazakh verse were identified, including stanza structure, syllable count, rhyme schemes, and metrical feet. The annotation system developed enables an accurate representation of the poetic features of the texts while accounting for the influence of Eastern literature and folk genres on the evolution of the Kazakh poetic tradition. One of the important innovations introduced is the semantic annotation of archaic vocabulary. The article also presents the design of the subcorpus interface, which allows users to explore poetic works in their original Arabic script alongside their transcribed Cyrillic versions. This makes the subcorpus a valuable tool for linguistic and literary research.

Arabic script , Kazakh language , metatextual annotation , National Corpus of the Kazakh Language , poetic subcorpus , text database , writing system

Text of the article Перейти на текст статьи

The Institute of Linguistics Named after Akhmet Baitursunuly, Kazakhstan
Department of History of Language and Turkology, The Institute of Linguistics Named after Akhmet Baitursunuly, 29 Kurmangazy St., Almaty, Kazakhstan
The Institute of Linguistics Named after Akhmet Baitursunuly, 29 Kurmangazy St., Almaty, Kazakhstan
Department of History of Language and Turkology, The Institute of Linguistics Named after Akhmet Baitursunuly, 29 Kurmangazy St., Almaty, Kazakhstan
Department of Psycholinguistics, The Institute of Linguistics Named after Akhmet Baitursunuly, 29 Kurmangazy St., Almaty, Kazakhstan

The Institute of Linguistics Named after Akhmet Baitursunuly
Department of History of Language and Turkology
The Institute of Linguistics Named after Akhmet Baitursunuly
Department of History of Language and Turkology
Department of Psycholinguistics

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026