The extraction of a brief summary from scientific documents using machine learning methods


Murzabekova G. Mukhamedrakhimova G. Taszhurekova Z. Yerbayev Y. Doumcharieva Z. Makhatova V. Tolganbaeva M. Serikbayeva S.
December 2025Institute of Advanced Engineering and Science

Bulletin of Electrical Engineering and Informatics
2025#14Issue 64812 - 4822 pp.

This study proposes a machine learning-based approach for automatic summarization of scientific documents using a fine-tuned DistilBART model a lightweight and efficient version of the bidirectional and auto-regressive transformers (BART) architecture. The model was trained on a large corpus of 12,540 scientific articles (2015–2023) collected from the arXiv repository, enabling it to effectively capture domain-specific terminology and structural patterns. The proposed pipeline integrates advanced text preprocessing techniques, including tokenization, stopword removal, and stemming, to enhance the quality of semantic representation. Experimental evaluation demonstrates that the fine-tuned DistilBART achieves high summarization performance, with ROUGE-2=0.472 and ROUGE-L=0.602, outperforming baseline transformer-based models. Unlike conventional approaches, the method shows strong applicability beyond academic research, including automated indexing of technical documentation, metadata extraction in digital libraries, and real-time text processing in embedded natural language processing (NLP) systems. The results highlight the potential of transformer-based summarization to accelerate scientific knowledge discovery and improve the efficiency of information retrieval across various domains.

Auto-regressive decoder , Bidirectional and auto-regressive transformers , DistilBART , Encoder , Natural language processing , Text extraction method

Text of the article Перейти на текст статьи

Department of Computer Sciences, Institute of Business and Digital Technologies, S. Seifullin Кazakh Agrotechnical University, Astana, Kazakhstan
Department of Radio Engineering, Electronics, and Telecommunications, Faculty of Physics and Technology, L. N. Gumilyov Eurasian National University, Astana, Kazakhstan
Faculty of Technologies, Taraz University named after M.Kh.Dulaty, Taraz, Kazakhstan
Polytechnic Institute, West Kazakhstan Agrarian and Technical University named after Zhangir Khan, Uralsk, Kazakhstan
Department of Applied Informatics and Programming, Faculty of Technology, Taraz Regional University named after M.Hh. Dulati, Taraz, Kazakhstan
Department of Software Engineering, Faculty of Physics, Mathematics and Information Technology, Atyrau University named after Kh. Dosmukhamedov, Atyrau, Kazakhstan
Department of Automation and Control, M. Auezov South Kazakhstan State University, Shymkent, Kazakhstan
Department of Information Systems, Faculty of Information Technology, L.N. Gumilyov Eurasian National University, Astana, Kazakhstan

Department of Computer Sciences
Department of Radio Engineering
Faculty of Technologies
Polytechnic Institute
Department of Applied Informatics and Programming
Department of Software Engineering
Department of Automation and Control
Department of Information Systems

10 лет помогаем публиковать статьи Международный издатель

Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026