Generating Ontology from a Set of Texts Belonging to a Certain Field of Knowledge

Akhmetov I. Aubakirov S. Saparov T. Mussabayev R. Krassovitsky A. Gelbukh A.
2025 Instituto Politecnico Nacional

Computacion y Sistemas
2025 #29 Issue 4 2521 - 2548 pp.

The automatic generation of ontologies from textual data is a crucial tool for organizing domain-specific knowledge, particularly in fields like natural language processing (NLP). This research explores methods for extracting, classifying, and structuring terms from scientific texts to create coherent ontologies. We evaluated techniques such as Term Frequency-Inverse Document Frequency (TFIDF) and TextRank for term extraction, as well as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging for classification. Hierarchical relationships between terms are established using clustering methods like Agglomerative Clustering and visualized through dendograms. The generated ontology is validated using cosine similarity, co-occurrence matrices, and topic modeling to ensure domain relevance and coherence. By comparing these methods, this study highlights their strengths and limitations, offering insights into how automated techniques can enhance ontology creation in specialized domains, facilitating better knowledge organization, retrieval, and machine understanding of unstructured data.

natural language processing , NER , Ontology , POS tagging , TextRank , TFIDF

Text of the article Перейти на текст статьи

Satbayev University, Kazakhstan
Kazakh-British Technical University, Kazakhstan
Instituto Politécnico Nacional, CIC, Mexico

Satbayev University
Kazakh-British Technical University
Instituto Politécnico Nacional

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026