COMPREHENSIVE ANALYSIS OF AVIATION MAINTENANCE TEXT REPORTS USING NATURAL LANGUAGE PROCESSING METHODS
Комплексний аналіз текстових звітів про авіаційне технічне обслуговування із використанням методів обробки природної мови
Savostin A. Kaipbek G. Koshekov K. Savostina G. Wardle K.
2025Natsionalnyi Hirnychyi Universytet
Naukovyi Visnyk Natsionalnoho Hirnychoho Universytetu
2025Issue 6157 - 167 pp.
Purpose. This study aims to develop and validate a comprehensive approach for analyzing unstructured textual descriptions of defects extracted from actual aviation maintenance data. The goal is to improve both the efficiency and depth of fault analysis by addressing two key tasks: automatic classification of defects into standard categories and identification of latent thematic subgroups within these categories. Methodology. The research is based on a dataset containing maintenance records from nine commercial aircraft over a seven-year period. A multi-stage preprocessing pipeline was developed, including an algorithm for domain-specific abbreviation identification and expert-driven decoding. To solve the multiclass classification task across 30 Chapter–Section (CS) categories, four approaches were compared: CountVectorizer with LinearSVC, TF-IDF and Word2Vec with logistic regression, and fine-tuning of the transformer-based DistilBERT model. For an in-depth analysis of the largest defect category, topic modeling based on Latent Dirichlet Allocation (LDA) was applied, with a quantitative procedure for selecting the optimal number of topics. Findings. The best performance in classification was achieved by the TF-IDF with logistic regression approach, reaching f1-macro = 0.762 and Cohen’s Kappa = 0.809, statistically comparable to CountVectorizer with LinearSVC. Classical methods significantly outperformed neural network models, underscoring their robustness for analyzing short technical texts. Topic modeling successfully decomposed the largest defect category into five interpretable and semantically coherent subgroups. Originality. The novelty of this work lies in developing and testing a formalised method for analysing unstructured aviation maintenance data, implemented as a single integrated process. The study also provides a detailed comparative evaluation of classical and modern NLP models on domain-specific aviation maintenance data. Practical value. The work is practical in nature and contains results which are ready for implementation. A prototype of an automated classifier has been created which is capable of processing the main flow of daily defect reports, reducing the time required for manual processing. An in-depth failure analysis tool has also been developed, which provides a transition from general fault codes to the analysis of specific sub-problems. This contributes to optimizing maintenance programs, enhancing diagnostic procedures, and ultimately improving flight safety.
aircraft , classification , maintenance , natural language processing , topic modeling
Text of the article Перейти на текст статьи
Manash Kozybayev North Kazakhstan University, Petropavlovsk, Kazakhstan
Civil Aviation Academy, Almaty, Kazakhstan
JSC Air Astana, Almaty, Kazakhstan
Manash Kozybayev North Kazakhstan University
Civil Aviation Academy
JSC Air Astana
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026