NMF-based approach to automatic term extraction
Nugumanova A. Akhmed-Zaki D. Mansurova M. Baiburin Y. Maulit A.
1 August 2022Elsevier Ltd
Expert Systems with Applications
2022#199
This work describes automatic term extraction approach based on the combination of the probabilistic topic modelling (PTM) and non-negative matrix factorization (NMF). Topic modeling algorithms including NMF-based ones do not require expensive and time-consuming manual annotations for domain terms, but only a corpus of domain documents. The topics emerge from the corpus documents without any supervision as sets of most probable words. This work is aimed to investigate how fully and precisely these most probable words from topics can reflect domain terminology. We run a series of experiments on the novel, qualitatively annotated dataset ACTER that was first used in the TermEval 2020 Shared Task. We compare five different NMF algorithms and four different NMF initializations when changing the number of topics extracted from documents and the number of most probable words extracted from topics in order to determine optimal combinations for best performance of term extraction. Finally, we compare the obtained optimal combinations of NMF with the competitive methods in TermEval 2020 and prove that our approach is second only to two much more sophisticated, domain-dependent supervised methods.
ACTER dataset , Automatic term extraction , NMF , Probabilistic topic modeling , TermEval shared task , Unsupervised term extraction
Text of the article Перейти на текст статьи
Sarsen Amanzholov East Kazakhstan University, Ust-Kamenogorsk, Kazakhstan
Astana IT University, Nur-Sultan, Kazakhstan
Al-Farabi Kazakh National University, Almaty, Kazakhstan
Sarsen Amanzholov East Kazakhstan University
Astana IT University
Al-Farabi Kazakh National University
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026