Abstract | Proces održavanja zrakoplova karakteriziraju velike količine strukturiranih i nestrukturiranih podataka koji se svakodnevno bilježe u obliku pilotskih izvještaja, izvještaja o aktivnostima održavanja, zapisa o nezgodama i zastojima, izvještaja nakon leta, itd. Znanja dobivena na temelju povijesnih podataka se mogu koristiti za unapređenje procesa održavanja zrakoplova, no ekstrakcija tih znanja se ne može obaviti ručno. U tu svrhu se koriste tehnike dubinske analize podataka, koje omogućavaju automatiziranu ili polu-automatiziranu ekstrakciju znanja iz skupova podataka. Predloženo istraživanje bavi se razvojem modela, temeljenih na tehnikama dubinske analize podatka, koji će služiti kao potpora pri odlučivanju o raspoloživosti zrakoplova, a pri modeliranju se koriste tri izvora podataka: podaci prikupljeni iz sustava za nadzor tehničke ispravnosti stanja zrakoplova, zapisi o nepravilnostima u radu sustava/prošlim kvarovima i zapisi o zastojima u eksploataciji zrakoplova (operacijskim zastojima). Podaci su prikupljeni u razdoblju od 76 uzastopnih mjeseci za četiri zrakoplova tipa Airbus A319/20 koji pripadaju floti jednog zračnog prijevoznika. U disertaciji su analizirani kritični sustavi zrakoplova; sustav automatskog leta (ATA 22), sustav upravljanja letjelicom (ATA 27), hidraulički sustav (ATA 29), sustav podvozja (ATA 32) i navigacijski sustav (ATA 34). Klasifikacijski modeli prve grupe izgrađeni su s ciljem predviđanja nastanka pilotskog upisa u tehničku knjigu na temelju prethodno generiranih poruka upozorenja u određenim fazama leta. Razvijen je postupak za strukturiranje podataka prikupljenih iz sustava za nadzor tehničke ispravnosti zrakoplova te postupak integracije tih podataka sa zapisima o nepravilnostima u radu sustava/prošlim kvarovima. Udruživanjem skupova podataka za izgradnju klasifikacijskih modela prve grupe otkriven je rang relevantnih značajki primjenom različitih filterskih postupaka (korelacijskog postupka, gini indeks postupka, informacijskog dobitka i omjera informacijskog dobitka) za selekciju značajki. Dodatno je provedeno istraživanje kako smanjenje udjela ulaznih značajki utječe na učinkovitost modela (F-mjeru, osjetljivosti i specifičnosti) za različite načine uzorkovanja podataka (uravnoteženo i slučajno uzorkovanje). Klasifikacijski modeli druge grupe izgrađeni su s ciljem predviđanja posljedice nastalih tekstualnih pilotskih zapisa na raspoloživost zrakoplova. Izgradnji modela prethodila je aktivnost udruživanja skupova podataka o nepravilnostima u radu sustava/kvarova i operacijskim zastojima. Usvajanjem evaluacijske mjere točnosti, modeli su uspoređeni s postojećim modelom iz sličnih istraživanja te je dokazana njihova primjenjivost. |
Abstract (english) | The aircraft maintenance process is characterized by large amounts of structured and unstructured data that are recorded daily in the form of pilot reports, maintenance logs, records of operational interruptions and technical incidents, post-flight reports, etc. The knowledge hidden within this data can potentially be used to improve the maintenance process, but its extraction can hardly be done manually. Therefore, in the last couple of years, a trend of development of predictive models using data mining techniques has been noticed. However, it can be concluded that these techniques are still not sufficiently applied in the process of aircraft maintenance because they require interdisciplinary knowledge, which includes understanding of the database, statistical knowledge, and understanding of machine learning and artificial intelligence techniques and models. This provides motivation for further research in this field. The aim of the work presented in this thesis is to develop a new decision support models based on data mining techniques that will be used for predicting aircraft availability. In modelling process, three independent sources of data will be used; data collected from aircraft health monitoring system (AHMS), information of past faults/defects and information of operational interruptions. These data were collected over a period of 76 consecutive months for four Airbus A319/20 aircraft. Only data from critical aircraft systems were analysed in this dissertation; auto flight system (ATA 22), flight control system (ATA 27), hydraulic system (ATA 29), landing gear system (ATA 32) and navigation system (ATA 34). Based on the data collected from these systems, two groups of classification models were built. The aim of the first group classification models is to determine whether a specific warning message/group of messages, collected from AHMS during different flight phases, will result in a pilot logbook entry. Prior to model development step, an algorithm for structuring AHMS data and algorithm for data fusion was developed. By integrating two data sources (warning messages from AHMS and information of past faults/defects), four filter methods (correlation based method, Gini index, information gain and information gain ratio) for feature selection were applied on a combined data source. In addition, research has been carried out to investigate how the reduction of the data dimensionality (a feature vector), in combination with different sampling techniques (stratified and shuffled), affects model performance measures (F-measure, sensitivity and specificity). The aim of the second group classification models is to determine whether the created pilot logbook record will affect the aircraft availability, i.e. whether it will result in aircraft on ground (AOG) status, delay or flight cancellation status. This group of models is also built on combined data set, i.e. by integrating information of past faults/defects and information of operational interruptions. By adopting the evaluation measure of accuracy, developed models were compared to existing model from similar past research, and its applicability was demonstrated. The research was carried out in several phases, which are summarized in the following chapters. Chapter 1 “Introduction” outlines literature gaps in the field of the aircraft maintenance. Based on the information acquired from published scientific papers and doctoral dissertations, various data mining techniques used for prediction in the field of aircraft maintenance were presented in the second subchapter. The rest of the sections present main aim of this research, hypothesis, expected scientific contributions, domain-specific terminology and an overview of the thesis. Chapter 2 “Prognostics in the aircraft maintenance process” highlights the benefits and challenges of the prognostics approaches currently applied in the literature. It explains in detail knowledge data discovery process, as well as supervised and unsupervised data mining techniques. Cluster analysis, association rules and various classification algorithms (Neural Networks, Decision Trees, Support Vector Machines and Naïve Bayes) were additionally described. Four filtering methods for feature selection were introduced. Different data sampling techniques and measures for evaluation of classification models were also described in this chapter. Chapter 3 “Processing and analysis of collected data” provides a detail explanation of the three independent data sources (AHMS data, information of past faults/defects and information of operational interruptions) used for this research. This chapter also presents a procedure for structuring AHMS data based on algorithm presented in Appendix A. Except for this procedure, two additional procedures for data fusion are shown; a procedure for data fusion of AHMS data and past faults/defects (Appendix B), and procedure for data fusion of past faults/defects and data containing information of operational interruptions. The final result of this procedures presents two combined data sets, which are transformed into a form suitable for modelling. In addition, an exploratory data analysis was conducted on these combined data sets and recommended guidelines were provided for future research phases. Chapter 4 “The process of discovering relevant features and the development of classification models” consists of the concise research steps used for building various data mining models. In the first section, an outline of the process developed within RapidMiner platform for discovering relevant features by applying filtering methods is given. This section also presents a process for applying the association rule mining to the combined data set to identify the features that appear more frequently together. The second section presents a process for building a first group classification models, while the third one presents a process for building a second group classification models. A list of the operators and their parameters used for building the models is presented. A graphical representation of these models is given in Appendix C. Chapter 5 “Research results” outlines the results of the models developed in previous chapters. The resulting models were tested on a verification data set, i.e. a data set that was not used for model building. To determine the performance of the built classification models, evaluation metric has been used. Due to the data imbalance, the first group classification models were evaluated by F-measure, sensitivity and specificity. All models were first built on the original dataset, which contains all the features. During the modelling phase, different data sampling techniques (stratified and shuffled) were used. By applying different filter methods, a rank of the features, representing a feature vector, was obtained (Appendix D). In order to evaluate results of the filter methods, the performance of the first group classification models was observed by gradually decreasing the number of the features from the rank obtained by each filter method. The main metric for the performance evaluation of the second group classification models was accuracy. After obtaining model accuracy, these results were compared with results from the similar model found in the literature. Chapter 6 “Conclusion” presents the final chapter where the original scientific contributions are presented and a summary of the results is provided. Also, theoretical and practical implications are outlined and areas for potential future research are suggested. |