Abstract | Sustavi za pretraživanje slika nastoje biti intuitivni i što jednostavniji za korištenje. Slike se mogu pretraživati prema vizualnom sadržaju ili prema tekstualnim oznakama kojima su označene. Automatsko označavanje slika razvijeno je kao alternativa pretraživanja slika koje koristi i vizualnu i tekstualnu informaciju. Kako bi rezultati automatskog označavanja odgovarali ključnim riječima koje korisnici intuitivno koriste prilikom pretraživanja slika, neophodno je u označavanje slike uključiti i apstraktnije koncepte nego što su to oznake klasa čije se instance pojavljuju na slici. Označavanje slike koje uključuje koncepte povezane sa slikom na različitim razinama apstrakcije, naziva se višeslojno tumačenje slike. U ovom se radu, za automatsko označavanje i višeslojno tumačenje slika predlaže sustav utemeljen na znanju. Činjenično i neizvjesno znanje o slikama iz domene vanjskih scena predstavljeno je shemom za predstavljanje znanja temeljem neizrazitih Petrijevih mreža (KRFPN shemom). Isti model koristi se za pretraživanje slika. Korištenjem algoritama neizrazitog zaključivanja na KRFPN shemi, kvantizirane vrijednosti komponenata vektora značajki, dobivene na području slike, se klasificiraju u odgovarajuću elementarnu klasu kojom se označava to područje slike. Elementarne klase dobivene prilikom automatskog označavanja područja slike mogu se vrednovati prema vjerojatnom kontekstu uzimajući u obzir pseudo-prostorne relacije definirane u bazi znanja između elementarnih klasa. Unija elementarnih klasa, dobivenih kao rezultat označavanja područja slike zaključivanjem na KRFPN shemi, ulaz je u KRFPN shemu više hijerarhijske razine. Na KRFPN shemi više hijerarhijske razine zaključuju se apstraktniji koncepti, kao što su klase scena, njihove generalizacije ili izvedene klase, koji su implicitno povezani sa slikom. Evaluacija različitih metoda automatskog označavanja slike, na istom skupu slika, pokazala je da su najbolji rezultati postignuti korištenjem KRFPN sheme kada su vektori značajki bili kvantizirani generaliziranim Lloydovim algoritmom. |
Abstract (english) | Systems for image retrieval tend to be more intuitive and easier to use. Images can be retrieved using visual contents or keywords they are annotated with. Automatic image annotation has been developed as an alternative to image retrieval and uses both visual and textual information. In order to match the results of the automatic annotation with keywords that users intuitively use when retrieving images, it is necessary to include more abstract concepts in image annotation than classes whose instances appear in the image. Image annotation which includes concepts associated with the image, at different levels of abstraction, is called multilayered image interpretation. In this dissertation a system for automatic image annotation and multilayered image interpretation, based on knowledge, is proposed. Factual and uncertain knowledge of outdoor image scenes is presented using hierarchically arranged knowledge representation schemes based on fuzzy Petri nets formalism (KRFPN1 and KRFPN2 schemes). Scheme KRFPN1 represents knowledge that is used when annotating image segments, and scheme KRFPN2, on the higher hierarchical level, is used for multilayer image interpretation. Given that knowledge of concepts is often incomplete and imprecise; an important property of these schemes is the ability to display the probability or reliability of concepts and relations using values ??associated with tokens and transitions. Since automatically segmented images of outdoor scenes were available, those were used for automatic annotation and multilayered interpretation. Each image segment was associated with a low-level feature vector and keywords from the controlled vocabulary. Therefore, that specific knowledge for a given context is included in the knowledge base represented by KRFPN1 scheme. Elements of KRFPN1 scheme are elementary classes that correspond to the keywords used to annotate image segments, attribute values ??that correspond to code words ??of feature vector components, as well as relations between elementary classes and their attribute values and pseudo-spatial relationships between elementary classes. In addition to the domain specific, more abstract concepts such as scene classes and aggregation relationships between scenes and elementary classes, the knowledge base represented by KRFPN2 scheme includes general knowledge that is relevant to the concepts of interest, such as generalization classes, derived classes and their relationships. The basic idea was to define a mapping that would classify the code words ??of the feature vector components obtained from the image segments, into the corresponding elementary class used to annotate that image segment. Since in the process of quantization some information is lost, three algorithms for designing the codebook were used: the k-means algorithm, the generalized Lloyd algorithm and the EM algorithm. For classification of code words, a fuzzy recognition algorithm on inverse KRFPN1 scheme was used for inference. The inference was based on relationships between the elementary classes and characteristic values ??of their attributes. Additionally, for the purpose of comparison, code words were classified using Na?ve Bayes and k-nearest neighbour algorithms. In comparison to the results of the automatic image annotation given by Na?ve Bayes algorithm and k-nearest neighbour algorithm on the same set of images, automatic image annotation obtained by the proposed system based on KRFPN1 scheme achieves the best results for each method of quantization. Further, obtained results have shown that the quantization affects the results of automatic annotation (classification), and that the best results are achieved by proposed system based on KRFPN1 scheme when the feature vectors were quantized using generalized Lloyd algorithm. Moreover, obtained results were better than the results of the automatic image annotation achieved by models dInd [Duygulu et al. 2002] and dMRF [Carbonetto et al. 2004], whose results on the same set of images were published in [Carbonetto et al. 2004]. Elementary classes obtained during automatic annotation of an image segments can be evaluated according to the probable context, taking into account the pseudo-spatial relationships defined in the knowledge base between elementary classes. Depending on the chosen strategy, obtained elementary class may be rejected as inconsistent or substituted by an elementary class that is more appropriate to the context and has the largest matching properties with the “inconsistent” class according to the fuzzy intersection algorithm on KRFPN1 scheme. Union of elementary classes, those that were obtained as a result of image segments annotation by inference on the KRFPN1 scheme is an input into the KRFPN2 scheme. The KRFPN2 scheme is used to infer more abstract concepts, such as scene classes, their generalizations or derived classes, which are implicitly associated with the image. Given that the schemes are independent, it is possible to use results of image annotation realized by some other method, such as Na?ve Bayesian classifier, as input to KRFPN2 scheme. The scene class that best suits given elementary classes is inferred using fuzzy recognition algorithm on the inverse KRFPN2 scheme. In addition, by using fuzzy inheritance algorithm on the KRFPN2 scheme, the scene classes can be generalized to more abstract classes that are closer to the user's interpretation of the images. The same model was used for image retrieval. As required concept can be at a different level of abstraction, from elementary class to scene or its generalization, characteristics of inheritance reasoning algorithm that can represent knowledge at different levels of abstraction become significant. The proposed system, which is based on KRFPN formalism used for annotation and multilayered interpretation of images in the domain of outdoor scenes, can be used as a template to describe the concepts of another domain. The methodology of acquiring knowledge concerning the concepts of multiple semantic levels is extensible and adaptable to the acquisition of knowledge about the appearance of the object of interest in a particular context. Specifically, KRFPN scheme provides a formal framework for the explicit, machine-workable image interpretation, based on which it is possible to perform, by following set of rules and existing knowledge, inference of new knowledge. As a consequence, this approach could enable the interpretation of images from different domains, if the knowledge base has at its disposal the relevant facts about the objects in the images. Further research will focus on adapting and applying this model to images of another domain. Furthermore, as the formalism of Petri nets has been successfully used to display sequential, parallel and synchronized events, it is expected that the KRFPN scheme, used for multilayered images interpreting, could be modified and used for the interpretation of videos. |