Recognizing unknown objects with attributes relationship model

doi:10.1016/j.eswa.2015.07.049

Expert Systems with Applications

Volume 42, Issue 23, 15 December 2015, Pages 9279-9283

https://doi.org/10.1016/j.eswa.2015.07.049 Get rights and content

Highlights

•
This paper tackle zero-shot learning problem in object recognition domain.
•
Unknown objects that have no training images are related with known objects.
•
A model that combines the benefits of attributes and image hierarchy is proposed.
•
The proposed method achieves state-of-the-art accuracy in AwA dataset.

Abstract

Generally, training images are essential for a computer vision model to classify specific object class accurately. Unfortunately, there exist countless number of different object classes in real world, and it is almost impossible for a computer vision model to obtain a complete training images for each of the different object class. To overcome this problem, zero-shot learning algorithm was emerged to learn unknown object classes from a set of known object classes information. Among these methods, attributes and image hierarchy are the widely used methods. In this paper, we combine both the strength of attributes and image hierarchy by proposing Attributes Relationship Model (ARM) to perform zero-shot learning. We tested the efficiency of the proposed algorithm on Animals with Attributes (AwA) dataset and manage to achieve state-of-the-art accuracy (50.61%) compare to other recent methods.

Introduction

Object recognition is one of the active research areas in the computer vision community due to its usefulness in real-life applications, ranging from content-based image retrieval system such as search engines over the internet (Wu, Jin, & Jain, 2013), to video surveillance system to identify uncommon or suspicious objects in a selected area (Lee, Nevatia, 2014, Lim, Tang, Chan, 2014). A well generalized object recognition system will greatly relax human efforts in identifying objects that have very minor difference in their appearance, but belonging to the same object category, for example by modelling a given object class by a set of modes deduced by a multi-finite mixture model (Bdiri, Bouguila, Ziou, 2014, Bourouis, Mashrgy, Bouguila, 2014).

However, when the numbers of distinct real-world objects grow increasingly large, it is very hard to have a computer vision model that is able to classify all of them. Besides, state-of-the-art object recognition algorithm always require a minimum number of samples from each object class to learn the difference between them. To add a new object class in the model after the learning process, the whole model will need to be retrained, and this is a tedious job. Therefore, zero-shot learning approach emerged (Frome, Corrado, Shlens, Bengio, Dean, Mikolov, et al., 2013, Hoo, Chan, 2013, Lampert, Nickisch, Harmeling, 2009, Lampert, Nickisch, Harmeling, 2014, Palatucci, Pomerleau, Hinton, Mitchell, 2009, Parikh, Grauman, 2011, Rohrbach, Stark, Schiele, 2011), where it will be able to categorize unknown object classes from existing samples of other classes, utilizing the semantic relationship between the unknown object class and the existing object classes. For example, if three object classes need to be learned for the classification model, and one of the object class is unknown (due to no training samples), the other two known object classes in the recognition system will be utilized to find the characteristics of the unknown object class. Then, the classification model will be able to classify all three object classes. And retraining the full model is no longer needed.

Since then many research have focused on zero-shot learning tasks. Palatucci et al. (2009) were the first who initiated the zero-shot learning paradigm, to learn semantic output codes classifier that learns semantic properties of known classes to predict the unknown classes. Lampert et al. (2009, 2014) used Direct Attributes Prediction (DAP) model and Indirect Attributes Prediction (IAP) model, utilizing the attributes information. Whereas, Parikh and Grauman (2011) proposed relative attributes that further enhance zero-shot learning by introducing relative relationship between object classes using attributes, in contrast to binary attributes approach used in Lampert et al. (2009, 2014). There are other lines of research works (Frome, Corrado, Shlens, Bengio, Dean, Mikolov, et al., 2013, Hoo, Chan, 2013, Rohrbach, Stark, Schiele, 2011) that favor image hierarchy approach finding the relationships between unknown object classes with existing known object classes. These approaches either rely on semantic information from WordNet, mining information from un-annotated data, or building specific Coarse Class-Fine Class lookup table. In more recent works, Fu, Hospedales, Xiang, and Gong (2014) proposed M2LATM that defines semi-latent attributes space, by using user-defined and latent attributes in one framework. Besides, Fu, Hospedales, Xiang, and Gong (2015a) used transductive multi-view embedding and heterogeneous multi-view label propagation method to overcome the known problems in zero-shot learning namely the projection domain shift and prototype sparsity. In addition, Liu, Zhang, and Chen (2014) proposed to learn attributes relation and attributes classifier jointly in a common objective function, while Fu, Xiang, Kodirov, and Gong (2015b) suggested to use semantic manifold distance to project semantic embedding space and recognize unknown object classes. All these aforementioned methods only use either attributes or image hierarchy.

In this paper, we proposed to combine the benefits of both attributes and image hierarchy. Specifically, we build an Attributes Relationship Model (ARM) to perform zero-shot learning, based on the hierarchical class concept in Hoo, Chan, 2013, Hoo, Chan, 2015. Our contribution is, instead of using the Coarse Class - Fine Class relation as in their paper, we proposed to use attributes to build the relationship model. Our intuition is given the attributes of each unknown object class, the known object classes that have similar attributes with the unknown object class will have a stronger relationship. Since each attributes only represent characteristics in part of the image, we group the known object classes with high correlation to the unknown object classes based on their attributes. In short, we have a centralized relationship model that infer which known object classes is most correlated to the specific unknown object class. This is different from the relationship used in DAP and IAP (Lampert, Nickisch, Harmeling, 2009, Lampert, Nickisch, Harmeling, 2014), which is not class-specific. These advantages enable the proposed method to enhance the zero-shot learning performance, where we achieve state-of-the-art results (50.41%) in Animal with Attributes (AwA) dataset.

This paper is organized as follows: we first formulate the proposed ARM model in Section 2. After that, we compare our relationship model with the current state-of-the-arts in Section 3. We then discuss our findings in Section 4, and conclude the paper in Section 5.

Section snippets

Attributes relationship model (ARM)

The proposed ARM aims to solve the zero-shot learning problem. Conventional learning models need to have at least one image sample of each object classes to learn their model. However, zero-shot learning allows missing training images on selected object class(es), denoted as the unknown object class. Attributes, in here, helps to relate unknown object class with the known object classes, because attributes are shared among all object classes (as in Lampert, Nickisch, Harmeling, 2009, Lampert,

Experiments

We test both binary ARM and real-valued ARM in this section. AwA dataset consists of 30,475 images with 40 c_s and 10 c_u are used. The implementation is similar to Hoo, Chan, 2013, Hoo, Chan, 2015. Specifically, we use the PHOG features, with 3 pyramid levels and 180^o. Then, we set RF to have 10 trees and each tree has 100 leafnodes. Then, 20 topics are used during the pLSA learning. Fig. 2 shows a summary of the c_s pair that is most correlated to the particular c_u, based on Tables 1 and 2. We

Discussion

The experiments show that real-valued attributes are more useful than binary attributes in zero-shot learning. Besides, using all the available attributes is not necessarily the best option. In fact, choosing the best attributes combination often helps to achieve better classification results. Therefore, choosing an optimum set of attributes combination remains a question. Nevertheless, our proposed method are still able to achieve good accuracy with a small number of attributes as compared to

Conclusion

In this paper, we investigated the zero-shot learning, to classify unknown object classes by using existing known object classes information. Specifically, we build Attributes Relationship Model (ARM) based on attributes and image hierarchy concept to learn unknown object classes from known classes’ information. The advantage is image hierarchy allows us to build relationship model within known and unknown object classes using coarse class strategy. In ARM, attributes identify mid level

Acknowledgment

This research is supported by the High Impact MoE Grant UM.C/625/1/HIR/MoE/FCSIT/08, H-22001-00-B00008 from the Ministry of Education Malaysia and UM Bright Sparks Programme.

References (22)

BdiriT. et al.
Object clustering and recognition using multi-finite mixtures for semantic classes and hierarchy modeling
Expert Systems with Applications
(2014)
BourouisS. et al.
Bayesian learning of finite generalized inverted dirichlet mixtures: application to object classification and forgery detection
Expert Systems with Applications
(2014)
LimM.K. et al.
isurveillance: intelligent framework for multiple events detection in surveillance videos
Expert Systems with Applications
(2014)
LiuM. et al.
Attribute relation learning for zero-shot classification
Neurocomputing
(2014)
AkataZ. et al.
Label-embedding for attribute-based classification
Proceedings of the ieee conference on computer vision and pattern recognition (cvpr)
(2013)
Akata, Z., Perronnin, F., Harchaoui, Z. Schmid, C. (2015). Label-embedding for image classification. arXiv preprint,...
CsurkaG. et al.
Visual categorization with bags of keypoints
FromeA. et al.
Devise: a deep visual-semantic embedding model
Proceedings of the advances in neural information processing systems
(2013)
FuY. et al.
Learning multimodal latent attributes
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2014)
FuY. et al.
Transductive multi-view zero-shot learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2015)

FuZ. et al.

Zero-shot object recognition by semantic manifold distance

Proceedings of the computer vision and pattern recognition (cvpr)

(2015)

Cited by (8)

Learning object-centric complementary features for zero-shot learning
2020, Signal Processing: Image Communication
Citation Excerpt :
Inspired by the ability of human to recognize new objects, zero-shot learning (ZSL) aims to recognize new categories that never appear in the training set. In recent years, ZSL has attracted extensive attention in the object recognition task [8–16]. In the ZSL task, the labeled training classes and the unlabeled test classes are entirely disjoint.
Zero-shot learning (ZSL) aims to recognize new objects that have never seen before by associating categories with their semantic knowledge. Existing works mainly focus on learning better visual-semantic mapping to align the visual and semantic space, while the effectiveness of learning discriminative visual features is neglected. In this paper, we propose an object-centric complementary features (OCF) learning model to take full advantage of visual information of objects with the guidance of semantic knowledge. This model can automatically discover the object region and obtain fine-scale samples without any human annotation. Then, the attention mechanism is used in our model to capture long-range visual features corresponding to semantic knowledge like ‘four legs’ and subtle visual differences between similar categories. Finally, we train our model with the guidance of semantic knowledge in an end-to-end manner. Our method is evaluated on three widely used ZSL datasets, CUB, AwA2, and FLO, and the experiment results demonstrate the efficacy of the object-centric complementary features, and our proposed method outperforms the state-of-the-art methods.
Prototype adjustment for zero shot classification
2019, Signal Processing: Image Communication
Citation Excerpt :
Zero shot learning problem is more challenging than few shot learning to classify the testing images without training images. Zero shot learning has been applied in many computer vision tasks, such as object recognition [7–9], image classification [10–13], event detection [14,15] and action recognition [16–18]. A survey on zero shot learning is presented in [2].
Zero shot classification addresses the problem of classifying unseen classes with seen class samples. Current zero shot learning methods mostly focus on learning the mapping function from image feature space to semantic space which is extremely important. However, these methods assume the seen and unseen class prototypes are fixed. A class prototype is referred to the semantic representation of a class. The semantic representation is represented by the attributes or word vectors which may be inaccurate and not discriminative. We attempt to find new prototypes that are more accurate for the zero shot classification tasks. In this paper, we proposed a Prototype adjustment method for the zero shot classification tasks (PAZSC) by adjusting the prototypes and learning the mapping function from image feature space to semantic space, simultaneously. The adjusted prototypes are more separable and discriminative for the zero shot classification tasks. A joint optimization function is proposed to learn the new prototypes and the mapping function. What is more, there is a domain shift problem in zero shot classification tasks caused by the disjointed seen and unseen images. We further learn a more generalizable mapping function to alleviate the domain shift problem. We have experimented on the state-of-the-art zero shot learning datasets, demonstrating that our PAZSL method has good performance.
Zero shot learning by partial transfer from source domain with L<inf>2,1</inf> norm constraint
2019, Journal of Visual Communication and Image Representation
Citation Excerpt :
Zero shot learning problem has attracted a great deal of attention in computer vision areas such as object recognition [1–4], image classification [2,5,6], action recognition [7,8] and event detection [9,10].
Current zero shot learning methods mostly focus on applying the knowledge learnt by seen images to the unseen images. However, there is a big distribution difference between seen and unseen data, also called source and target domain. Thus, there are many irrelevant seen samples for unseen samples. We want to partially transfer the seen samples to target domain by selecting relevant seen samples. In this paper, we propose a method, zero shot learning by partial transfer from source domain with $L_{2, 1}$ norm constraint, called ZSLPT which embeds visual similarity and semantic similarity to transfer partial source samples. The relevant source samples are selected, while the irrelevant are eliminated. What’s more, we train source classification model used for transferring to target domain with the selected source samples, making the transferred target model more accurate. We have experimented on the state-of-the-art zero shot learning datasets, demonstrating that ZSLPT has good performance.
Combining ontology and reinforcement learning for zero-shot classification
2018, Knowledge-Based Systems
Citation Excerpt :
Not all attributes have a role in ZSC [3]. Choosing the best attribute combination benefits ZSC [11]. Attributes which distinguish one set of classes from another in scope are called ‘discriminative attributes’ [14].
Zero-Shot Classification (ZSC) has received much attention recently in computer vision research. Traditional classifiers are unable to handle ZSC because test data labels are significantly different from training data labels. Attribute-based methods have long dominated ZSC. However, classical attribute-based methods fail to distinguish between discriminative attributes and non-discriminative attributes and do not distinguish the different contributions each attribute makes to classification. We propose CORL (Combining Ontology and Reinforcement Learning) for ZSC. CORL first obtains hierarchical classification rules from attribute annotations of object classes based on ontology. These rules contain only discriminative attributes. Reinforcement learning is used to adaptively determine the discriminative degrees of the rules. The most discriminative rules are then selected for ZSC. Experiments on three benchmark datasets showed that CORL achieved higher accuracies than baseline classifiers. This suggests that CORL effectively discovers the most discriminative rules for ZSC.
Multi-view Discriminative Feature Selection
2021, ACM International Conference Proceeding Series
Database of the Results of Monitoring the Street and Road Network Facilities
2021, 2021 Intelligent Technologies and Electronic Devices in Vehicle and Road Transport Complex, TIRVED 2021 - Conference Proceedings

View all citing articles on Scopus

View full text

Recognizing unknown objects with attributes relationship model

Highlights

Abstract

Introduction

Section snippets

Attributes relationship model (ARM)

Experiments

Discussion

Conclusion

Acknowledgment

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Neurocomputing

Label-embedding for attribute-based classification

Proceedings of the ieee conference on computer vision and pattern recognition (cvpr)

Visual categorization with bags of keypoints

Devise: a deep visual-semantic embedding model

Proceedings of the advances in neural information processing systems

Learning multimodal latent attributes

IEEE Transactions on Pattern Analysis and Machine Intelligence

Transductive multi-view zero-shot learning

IEEE Transactions on Pattern Analysis and Machine Intelligence

Zero-shot object recognition by semantic manifold distance

Proceedings of the computer vision and pattern recognition (cvpr)