doctoral thesis
Hybrid techniques of combinatorial optimization based on genetic algorithms with application to feature selection in retail credit risk assessment

Stjepan Oreški (2014)
Sveučilište u Zagrebu
Fakultet organizacije i informatike Varaždin
Metadata
TitleHibridne tehnike kombinatorne optimizacije temeljene na genetskim algoritmima s primjenom na odabir atributa u ocjenjivanju kreditnog rizika građana
AuthorStjepan Oreški
Mentor(s)Božidar Kliček
Abstract
Hibridne tehnike kombinatorne optimizacije predstavljaju rastuće područje istraživanja, namijenjeno za rješavanje složenih problema kombinatorne optimizacije. U prvom dijelu ove disertacije, usredotočiti smo se na metodološku pozadinu hibridnih tehnika kombinatorne optimizacije, usmjeravajući posebnu pozornost na važne koncepte u području kombinatorne optimizacije i računske teorije složenosti, kao i na strategije hibridizacije koje su važne pri razvoju hibridnih tehnika kombinatorne optimizacije. U skladu s prikazanim odnosima među tehnikama kombinatorne optimizacije, strategijama njihova kombiniranja kao i konceptima za rješavanje problema kombinatorne optimizacije, ova disertacija kreira nove hibridne tehnike za odabir atributa i klasifikaciju pri procjeni kreditnog rizika. Disertacija naglašava važnost hibridizacije kao koncepta suradnje među metaheuristikama i drugim tehnikama za optimizaciju. Važnost takve suradnje potvrđuju rezultati koji su predstavljeni u eksperimentalnom dijelu rada, koji su dobiveni na hrvatskom i njemačkom kreditnom skupu podataka korištenjem hibridnih tehnika kombinatorne optimizacije kreiranim u ovoj disertaciji. Znanstveni doprinos disertacije: Kreirane hibridne tehnike selekcije atributa (GA-NN i HGA-NN), posebno prilagođene problemskoj domeni - temeljene na genetskim algoritmima i umjetnim neuronskim mrežama. Kreiran novi hibridni genetski algoritam uključivanjem rezultata filtarskih tehnika i a priori spoznaja u početnu populaciju genetskog algoritma. Kreiran novi operator selekcije kod genetskog algoritma, jedinstvena selekcija (engl. unique selection). Kreirani sofisticirani kreditni modeli koji omogućuju povećanje učinkovitosti alokacije kapitala.
Keywordshybrid techniques classification feature selection credit risk genetic algorithm neural networks
Parallel title (English)Hybrid techniques of combinatorial optimization based on genetic algorithms with application to feature selection in retail credit risk assessment
Committee MembersDiana Šimić (committee chairperson)
Božidar Kliček (committee member)
Dragan Gamberger (committee member)
GranterSveučilište u Zagrebu
Fakultet organizacije i informatike Varaždin
PlaceVaraždin
StateCroatia
Scientific field, discipline, subdisciplineSOCIAL SCIENCES
Information and Communication Sciences
Information Systems and Information Science
UDK004
GENERALLY
Computer science and technology. Computing. Data processing
Study programme typeuniversity
Study levelpostgraduate
Study programmePostgraduate doctoral study in Information Science
Academic title abbreviationdr.sc.
Genredoctoral thesis
Language Croatian
Defense date2014-10-14
Parallel abstract (English)
The purpose of this dissertation is to thoroughly investigate the overall data set available to the bank and to determine the extent to which these data can be a good basis for predicting the credit worthiness of the loan applicant. Such a prediction of the applicants ability should be done without seeking additional information from the client, assuming that the loan applicant is a long-time customer of the bank and that the bank has collected sufficient data on the client in its database. Banks worldwide have accumulated large amounts of data and information about their clients, their financial solvency and payment history. The issue is usually in the multitude of irrelevant data or attributes contained in the accumulated data. In this context, irrelevant attributes are a problem. Irrelevant attributes in the training data set will not lead to more accurate results of classification analysis, but will: (1) increase the cost of data collection, (2) increase the time required for learning and constructing models as well as (3) decrease the user-friendliness of the model itself. Hence, there is the need for classification data preprocessing in order to: improve the quality of the constructed model, reduce the complexity of the model and to reduce the cost of usage. In the data preprocessing, one of the most important activities is the feature selection. The ultimate objectives of the study were twofold: (1) to develop a highly efficient hybrid technique, in line with the latest scientific and technical knowledge, to select the optimal subset of features when assessing the credit worthiness of the loan applicant, and (2) to collect additional knowledge and experience about the specific advantages and disadvantages of individual techniques as well as combine these techniques to other similar problems in meaningful ways. Theoretically speaking, the selection of the optimal features subset belongs to the class of combinatorial optimization problems. Such problems are usually solved by combining: exact and heuristic algorithms or more (meta) heuristic algorithms. The newly generated algorithms, in this case hybrids, are in various ways trying to combine the advantages of two or more different types of algorithms. The paper discusses different forms of hybridization. From hybridization at a low level, where the result is one unique optimization technique which is a functionally indivisible whole, to the hybridization at a high level at which different algorithms are independent entities and their form of collaboration is cooperation. Various optimization techniques, from exact ones to heuristics, were combined with the hypothesis that the benefit Xcomes from the synergy of different techniques. It is of paramount importance to establish a dynamic balance between diversification and intensification for the quick identification of areas in the search space with high-quality solutions, without losing too much time in the search space that have already been explored or do not provide quality solutions. In addition to the potential benefits, hybrid techniques bring some unavoidable disadvantages such as: the increased complexity of technique, the need for more knowledge and effort in the design and implementation of the solution, and the narrow orientation for solving specific problems only. Here the well-known theorem "No free lunch" (Wolpert and Macready, 1996) gains prominence. It says that there is no technique that would be better than all others in all conditions. More hybrid algorithms are developed and shown in the paper. The first of these is a combination of genetic algorithms and artificial neural networks (GA-NN). The specificity of the mentioned algorithm is that it simultaneously performs the selection of the optimal subset of attributes, and accordingly to the attributes of a given set, adjusts the parameters of artificial neural networks. The second algorithm is a combination of the hybrid genetic algorithm and the artificial neural network (HGA-NN). The latter is a logical continuation of the first, and an extension of the GA-NN algorithm in terms of the preliminary restriction attributes to only those attributes that have been distinguished by fast filtering algorithms or domain experts. Also, some improvements have been made through the genetic algorithm: (1) the creation of the initial population and (2) the introduction of the incremental stage. In the third experiment, special emphasis is given to the problems related to the classification of imbalanced datasets. An overview of the main paradigm characteristics was presented that is traditionally applied to the classification of imbalanced data. Techniques for mitigating problems related to the cost-sensitive classification of class imbalanced data in combination with techniques based on genetic algorithms, GA-NN and HGA-NN, are explored. Performance is measured by a variety of measures, focusing on the relative cost of misclassification. The study was conducted on Croatian and German data sets. The results showed that the specified extension, from the cost point of view, results in the HGA-NN ROS technique which is better compared to the results presented in the literature. The results of the presented algorithms clearly indicate the potential in: solving the attributes selection problem and citizens credit risk evaluation, thereby justifying a larger effort in the design and implementation. The presented algorithms potential in evaluating citizens credit risk may be used to improve the way in which banks manage the citizens credit risk, which is the promotion of a stable and healthy banking. The need for better Xmanagement of credit risks and sophisticated credit models motivated the research presented in this paper.
Parallel keywords (Croatian)hibridne tehnike klasifikacija odabir atributa kreditni rizici genetski algoritam neuronske mreže
Versionaccepted version
Resource typetext
Access conditionOpen access
Terms of usehttp://rightsstatements.org/vocab/InC/1.0/
Noteaccepted version
URN:NBNhttps://urn.nsk.hr/urn:nbn:hr:211:466344
CommitterLjiljana Hajdin