Generating Ontology from a Set of Texts Belonging to a Certain Field of Knowledge
Akhmetov I. Aubakirov S. Saparov T. Mussabayev R. Krassovitsky A. Gelbukh A.
2025Instituto Politecnico Nacional
Computacion y Sistemas
2025#29Issue 42521 - 2548 pp.
The automatic generation of ontologies from textual data is a crucial tool for organizing domain-specific knowledge, particularly in fields like natural language processing (NLP). This research explores methods for extracting, classifying, and structuring terms from scientific texts to create coherent ontologies. We evaluated techniques such as Term Frequency-Inverse Document Frequency (TFIDF) and TextRank for term extraction, as well as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging for classification. Hierarchical relationships between terms are established using clustering methods like Agglomerative Clustering and visualized through dendograms. The generated ontology is validated using cosine similarity, co-occurrence matrices, and topic modeling to ensure domain relevance and coherence. By comparing these methods, this study highlights their strengths and limitations, offering insights into how automated techniques can enhance ontology creation in specialized domains, facilitating better knowledge organization, retrieval, and machine understanding of unstructured data.
natural language processing , NER , Ontology , POS tagging , TextRank , TFIDF
Text of the article Перейти на текст статьи
Satbayev University, Kazakhstan
Kazakh-British Technical University, Kazakhstan
Instituto Politécnico Nacional, CIC, Mexico
Satbayev University
Kazakh-British Technical University
Instituto Politécnico Nacional
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026