A Corpus Approach in Language Discovery: A Word Frequency Analysis Based on the Corpus Outcomes in Kazakh


Omarova S. Ospanova D. Aitova N. Tokenkyzy G. Ormanova A. Alshynbekova M.
February 2025Bilingual Publishing Group

Forum for Linguistic Studies
2025#7Issue 2869 - 881 pp.

This study examines the most frequently used parts of speech and grammatical forms in the texts of the Sub-corpora of the National Corpus of the Kazakh Language (qazcorpora.kz). The frequency of word forms based on the 13-million-word usages in the 2023 corpus database was collected and analyzed both manually and using the functional setting of the corpus software. The study provided key insights into Kazakh journalistic texts’ frequency distribution, grammatical variability, and comparative patterns. The results indicated that: (1) conjunction ‘žäne’ [and], demonstrative pronoun ‘bul’ [this], auxiliary verb ‘dep’ [no translation], noun ‘Kazakh’ [Kazakh], modal verb ‘žoq’ [not], adjective ‘aq’ [white], adverb ‘köp’ [many/much], numeral ‘eki’ [two] showed the highest frequency indicators emphasizing their functional and stylistic roles in text construction in their word class. (2) functional words were the most frequently used part of speech. (3) conjunction ‘žäne’ [and], postposition ‘üšın’ [for] and particle ‘ɣana’ [only] possessed the highest frequency indicators among functional words. This corpus-based research highlights the alignment of Kazakh frequency patterns with global linguistic trends, such as Zipf’s law, while also showcasing unique features attributed to the language’s agglutinative nature. Copyright

Corpus Linguistics , Frequency Indicator , Grammatical Form , National Corpus , Part of Speech

Text of the article Перейти на текст статьи

National Scientific and Practical Center «Til-Qazyna», Astana, 010000, Kazakhstan
Department of Kazakh Linguistics, Eurasian National University, Astana, 010000, Kazakhstan
Branch campus of Beijing Language and Culture University (BLCU), Astana International University, Astana, 010000, Kazakhstan

National Scientific and Practical Center «Til-Qazyna»
Department of Kazakh Linguistics
Branch campus of Beijing Language and Culture University (BLCU)

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026