Elsevier

Expert Systems with Applications

Volume 42, Issue 23, 15 December 2015, Pages 9279-9283
Expert Systems with Applications

Recognizing unknown objects with attributes relationship model

https://doi.org/10.1016/j.eswa.2015.07.049Get rights and content

Highlights

  • This paper tackle zero-shot learning problem in object recognition domain.

  • Unknown objects that have no training images are related with known objects.

  • A model that combines the benefits of attributes and image hierarchy is proposed.

  • The proposed method achieves state-of-the-art accuracy in AwA dataset.

Abstract

Generally, training images are essential for a computer vision model to classify specific object class accurately. Unfortunately, there exist countless number of different object classes in real world, and it is almost impossible for a computer vision model to obtain a complete training images for each of the different object class. To overcome this problem, zero-shot learning algorithm was emerged to learn unknown object classes from a set of known object classes information. Among these methods, attributes and image hierarchy are the widely used methods. In this paper, we combine both the strength of attributes and image hierarchy by proposing Attributes Relationship Model (ARM) to perform zero-shot learning. We tested the efficiency of the proposed algorithm on Animals with Attributes (AwA) dataset and manage to achieve state-of-the-art accuracy (50.61%) compare to other recent methods.

Introduction

Object recognition is one of the active research areas in the computer vision community due to its usefulness in real-life applications, ranging from content-based image retrieval system such as search engines over the internet (Wu, Jin, & Jain, 2013), to video surveillance system to identify uncommon or suspicious objects in a selected area (Lee, Nevatia, 2014, Lim, Tang, Chan, 2014). A well generalized object recognition system will greatly relax human efforts in identifying objects that have very minor difference in their appearance, but belonging to the same object category, for example by modelling a given object class by a set of modes deduced by a multi-finite mixture model (Bdiri, Bouguila, Ziou, 2014, Bourouis, Mashrgy, Bouguila, 2014).

However, when the numbers of distinct real-world objects grow increasingly large, it is very hard to have a computer vision model that is able to classify all of them. Besides, state-of-the-art object recognition algorithm always require a minimum number of samples from each object class to learn the difference between them. To add a new object class in the model after the learning process, the whole model will need to be retrained, and this is a tedious job. Therefore, zero-shot learning approach emerged (Frome, Corrado, Shlens, Bengio, Dean, Mikolov, et al., 2013, Hoo, Chan, 2013, Lampert, Nickisch, Harmeling, 2009, Lampert, Nickisch, Harmeling, 2014, Palatucci, Pomerleau, Hinton, Mitchell, 2009, Parikh, Grauman, 2011, Rohrbach, Stark, Schiele, 2011), where it will be able to categorize unknown object classes from existing samples of other classes, utilizing the semantic relationship between the unknown object class and the existing object classes. For example, if three object classes need to be learned for the classification model, and one of the object class is unknown (due to no training samples), the other two known object classes in the recognition system will be utilized to find the characteristics of the unknown object class. Then, the classification model will be able to classify all three object classes. And retraining the full model is no longer needed.

Since then many research have focused on zero-shot learning tasks. Palatucci et al. (2009) were the first who initiated the zero-shot learning paradigm, to learn semantic output codes classifier that learns semantic properties of known classes to predict the unknown classes. Lampert et al. (2009, 2014) used Direct Attributes Prediction (DAP) model and Indirect Attributes Prediction (IAP) model, utilizing the attributes information. Whereas, Parikh and Grauman (2011) proposed relative attributes that further enhance zero-shot learning by introducing relative relationship between object classes using attributes, in contrast to binary attributes approach used in Lampert et al. (2009, 2014). There are other lines of research works (Frome, Corrado, Shlens, Bengio, Dean, Mikolov, et al., 2013, Hoo, Chan, 2013, Rohrbach, Stark, Schiele, 2011) that favor image hierarchy approach finding the relationships between unknown object classes with existing known object classes. These approaches either rely on semantic information from WordNet, mining information from un-annotated data, or building specific Coarse Class-Fine Class lookup table. In more recent works, Fu, Hospedales, Xiang, and Gong (2014) proposed M2LATM that defines semi-latent attributes space, by using user-defined and latent attributes in one framework. Besides, Fu, Hospedales, Xiang, and Gong (2015a) used transductive multi-view embedding and heterogeneous multi-view label propagation method to overcome the known problems in zero-shot learning namely the projection domain shift and prototype sparsity. In addition, Liu, Zhang, and Chen (2014) proposed to learn attributes relation and attributes classifier jointly in a common objective function, while Fu, Xiang, Kodirov, and Gong (2015b) suggested to use semantic manifold distance to project semantic embedding space and recognize unknown object classes. All these aforementioned methods only use either attributes or image hierarchy.

In this paper, we proposed to combine the benefits of both attributes and image hierarchy. Specifically, we build an Attributes Relationship Model (ARM) to perform zero-shot learning, based on the hierarchical class concept in Hoo, Chan, 2013, Hoo, Chan, 2015. Our contribution is, instead of using the Coarse Class - Fine Class relation as in their paper, we proposed to use attributes to build the relationship model. Our intuition is given the attributes of each unknown object class, the known object classes that have similar attributes with the unknown object class will have a stronger relationship. Since each attributes only represent characteristics in part of the image, we group the known object classes with high correlation to the unknown object classes based on their attributes. In short, we have a centralized relationship model that infer which known object classes is most correlated to the specific unknown object class. This is different from the relationship used in DAP and IAP (Lampert, Nickisch, Harmeling, 2009, Lampert, Nickisch, Harmeling, 2014), which is not class-specific. These advantages enable the proposed method to enhance the zero-shot learning performance, where we achieve state-of-the-art results (50.41%) in Animal with Attributes (AwA) dataset.

This paper is organized as follows: we first formulate the proposed ARM model in Section 2. After that, we compare our relationship model with the current state-of-the-arts in Section 3. We then discuss our findings in Section 4, and conclude the paper in Section 5.

Section snippets

Attributes relationship model (ARM)

The proposed ARM aims to solve the zero-shot learning problem. Conventional learning models need to have at least one image sample of each object classes to learn their model. However, zero-shot learning allows missing training images on selected object class(es), denoted as the unknown object class. Attributes, in here, helps to relate unknown object class with the known object classes, because attributes are shared among all object classes (as in Lampert, Nickisch, Harmeling, 2009, Lampert,

Experiments

We test both binary ARM and real-valued ARM in this section. AwA dataset consists of 30,475 images with 40 cs and 10 cu are used. The implementation is similar to Hoo, Chan, 2013, Hoo, Chan, 2015. Specifically, we use the PHOG features, with 3 pyramid levels and 180o. Then, we set RF to have 10 trees and each tree has 100 leafnodes. Then, 20 topics are used during the pLSA learning. Fig. 2 shows a summary of the cs pair that is most correlated to the particular cu, based on Tables 1 and 2. We

Discussion

The experiments show that real-valued attributes are more useful than binary attributes in zero-shot learning. Besides, using all the available attributes is not necessarily the best option. In fact, choosing the best attributes combination often helps to achieve better classification results. Therefore, choosing an optimum set of attributes combination remains a question. Nevertheless, our proposed method are still able to achieve good accuracy with a small number of attributes as compared to

Conclusion

In this paper, we investigated the zero-shot learning, to classify unknown object classes by using existing known object classes information. Specifically, we build Attributes Relationship Model (ARM) based on attributes and image hierarchy concept to learn unknown object classes from known classes’ information. The advantage is image hierarchy allows us to build relationship model within known and unknown object classes using coarse class strategy. In ARM, attributes identify mid level

Acknowledgment

This research is supported by the High Impact MoE Grant UM.C/625/1/HIR/MoE/FCSIT/08, H-22001-00-B00008 from the Ministry of Education Malaysia and UM Bright Sparks Programme.

References (22)

  • FuZ. et al.

    Zero-shot object recognition by semantic manifold distance

    Proceedings of the computer vision and pattern recognition (cvpr)

    (2015)
  • Cited by (8)

    • Learning object-centric complementary features for zero-shot learning

      2020, Signal Processing: Image Communication
      Citation Excerpt :

      Inspired by the ability of human to recognize new objects, zero-shot learning (ZSL) aims to recognize new categories that never appear in the training set. In recent years, ZSL has attracted extensive attention in the object recognition task [8–16]. In the ZSL task, the labeled training classes and the unlabeled test classes are entirely disjoint.

    • Prototype adjustment for zero shot classification

      2019, Signal Processing: Image Communication
      Citation Excerpt :

      Zero shot learning problem is more challenging than few shot learning to classify the testing images without training images. Zero shot learning has been applied in many computer vision tasks, such as object recognition [7–9], image classification [10–13], event detection [14,15] and action recognition [16–18]. A survey on zero shot learning is presented in [2].

    • Zero shot learning by partial transfer from source domain with L<inf>2,1</inf> norm constraint

      2019, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Zero shot learning problem has attracted a great deal of attention in computer vision areas such as object recognition [1–4], image classification [2,5,6], action recognition [7,8] and event detection [9,10].

    • Combining ontology and reinforcement learning for zero-shot classification

      2018, Knowledge-Based Systems
      Citation Excerpt :

      Not all attributes have a role in ZSC [3]. Choosing the best attribute combination benefits ZSC [11]. Attributes which distinguish one set of classes from another in scope are called ‘discriminative attributes’ [14].

    • Multi-view Discriminative Feature Selection

      2021, ACM International Conference Proceeding Series
    • Database of the Results of Monitoring the Street and Road Network Facilities

      2021, 2021 Intelligent Technologies and Electronic Devices in Vehicle and Road Transport Complex, TIRVED 2021 - Conference Proceedings
    View all citing articles on Scopus
    View full text