Sparse representations based attribute learning for flower classification

doi:10.1016/j.neucom.2014.05.011

Neurocomputing

Volume 145, 5 December 2014, Pages 416-426

https://doi.org/10.1016/j.neucom.2014.05.011 Get rights and content

Abstract

Classification for flowers is a very difficult task. Traditional methods need to built a classifier for each flower category, and obtain large number of flower samples to train these classifiers. In practice, many different types of flowers make the job become very difficult and boring. In this work, we present an attribute based approach for flowers recognition. Particularly, instead of training for a specific category of flowers directly based on manually designed features such as SIFT and HoG, we extract a series of visual attributes from a given set of flower images and generalize these to new images with possibly unknown flowers. A recently proposed sparse representations classification scheme is employed to predict the attributes of a given flower image from any category. In addition, we use a genetic algorithm to find the most discriminative attributes among others for better performance during the stage of flower classification. The effectiveness of the proposed method is validated on a publicly available flower classification database with promising results.

Introduction

The task of classification for flowers is very difficult. There are large variations in scale and viewpoint in typical flower images, partial occlusions, illumination, multiple instances, etc. Perhaps the biggest challenge comes from the intra-class and the inter-class variability. For example, some images from different classes are with a smaller variation than from a class itself, and some minute differences determine their different classification [1], [2], [3], [4], [5], [6], [7]. In addition, as a plant with continuous growth and a non-rigid object, flowers can be deformed in many ways, so in the intra-class there is also a big change. Recent machine learning technique has demonstrated its power in classifying flowers. For example, one can consider kernel SVM [8] or boosting [9]. The performance of the state of the art method is obtained with the use of multiple features [10] and a multiple kernel classifier: each kernel is designed for different features (e.g. colour, texture), and an additional kernel is designed for the weighted combination of these feature kernels.

Recently, along with the emergence of camera-equipped mobile phones, new opportunity and challenge have generated in the computer vision field. One of these challenges is that in resource constrained environments, available memory, bandwidth and processing power become restricting factors in image classification [11]. For example, with a camera-equipped smart phone, a user captures a flower image and wants to learn more knowledge about the flower. In a traditional image identification system, it would either extract some type of features of the flower image and transmit it to a server or transmit the whole flower image to the server directly. Then, for recognizing the category of the flower image and feeding back the depicted information of the flower, the server needs to perform a series of classification tests.

Regarding the feasibility of flowers classifying, there are two important issues. The first one is that there should be enough labeled examples for training the classifier. Then we can use the classifier to identify the possible category of new test examples having the same distribution with the training examples. However, due to the large number of the possible image classes, collecting a large amount of examples for training may not be feasible even though the number of classes is not a large one [12]. In addition, training classifiers of a real-time application in the server may also be impractical, because along with the increase of visitors, the performance of system response will drop quickly.

The second one which leads to the failure of system design is the limited power availability, bandwidth and processing capabilities of mobile systems. For example, the available battery power will quickly drain when transmitting raw images. Besides, because the transmitted information load also increase congestion on the server and the network, the user satisfaction will be directly affected.

To cope with the challenges mentioned above, attribute learning proposed recently is attempted to learn attributes instead of traditional classes of image. In the camera-equipped smart phone system shown in Fig. 1, it could recognize the attributes e.g. “red” or “green” for the specific classes׳ classification. The processing pipeline includes two stages: the one is the attribute vector creating process which is accomplished by the mobile facility, and the other one is final classification taken place on the server. The classification stage is learned off-line from textual description, which gives consideration to the mapping of attributes to classes.

There are many advantages in employing attributes to be a bridge between image examples and classes. The greatest benefit from the use of attributes is that classifiers can be trained and tested using only text information without direct image features. This means that instead of using label sample as metadata, and in order to automatically identify test examples, large knowledge databases can be employed for extracting information. Besides, textual information is easier to transmit, process and store than the image information. Compared to raw images, the text of attributes with smaller dimensions can be easier employed to reduce the query time for a huge retrieval system. Considering that the visits to the server is very large at the same time, the advantage mentioned above becomes very vital for a system like the one shown in Fig. 1.

In recent years, as a kind of high-level image feature, the semantic attribute has gained more and more attention in the field of computer vision. Attributes׳ learning has been applied widespread in classifying objects [13] or images [14]. Farhadi et al. [15] first put forward a set of visual semantic attributes to describe objects. Later, Kumar et al. [16] proposed a novel method to predict the visual attribute vector through the related attribute classifiers, and using these attribute vectors to represent faces. Vogel and Schiele [17] employed visual attributes to express the semantics of the outdoor scenes. Vaquero et al. proposed an attribute based people search method [18]. Attributes learning also has many potential applications in transfer learning [19], multi-label learning [20], multi-instance learning [21], video annotation [22] and image retrieval [23]. Despite that the semantic attributes have been used in many kinds of image classifications, to our knowledge there is no system that identifies flowers using their attributes.

As far as the power availability and the communication bandwidth of mobile systems are concerned, when the raw images or image descriptors are limitedly transferred, attribute expression can still be transmitted. In addition, instead of integrating the whole system of images recognition, only two processing stages are needed in the mobile system. The first one is the feature extraction, and the second is the attribute prediction. In recent years, feature extraction and image classification have been applied in smart phones and other mobile devices [24], [25], [26] and have got very exciting results.

Attribute based identification as a new way for image classification can be used to deal with the flowers classifying based on camera-equipped mobile phones. However, for the existence of a large number of available attributes, collecting samples to train the attribute classifiers is a tedious task. In addition, the traditional attribute based classification framework assumes independence among attributes, which cannot be ensured by the attributes annotated by human. In this paper we propose an integrative framework that enables us to predict attribute automatically and to estimate the prior attribute–class probability relationship matrix. We use some image samples to compose the dictionary of the attribute classifier and employ the recently proposed sparse representations classification scheme to predict other samples attribute. Besides, in order to find the most discriminative attributes, a genetic algorithm is employed to reduce attributes for the flower classification. The benefits of the proposed extensions are validated through the attribute-to-class mapping experimental results.

Section snippets

The proposed method

In the traditional approach, given a signal such as a vectorized image $x \in R^{n}$ , the signal x is called the k-sparse with respect to a dictionary $D \in R^{n \times n}$ if x=Ds where $k = ∥ s ∥_{0}$ . If the class of the image x is $y \in Y$ , there will be a mapping matrix W which makes $y = Wx = WDs$ . The parameter W will be learned by the training samples in the training step and after that the label y of the testing sample will be predicted by the learned classifier [27], [28], [29].

In the classification scheme based on the

Experimental setting and arrangement

Data setting: In order to compare our methods׳ performance to the others, a public flower dataset called Oxford17 is chosen as the experimental dataset, http://www.robots.ox.ac.uk/vgg/data/flowers/index.html. The flowers dataset consists of 17 species of flowers with 80 images of each (Fig. 3). The samples of the Oxford17 are all natural images. In Oxford17, some categories of flowers have very distinctive visual appearance, e.g. tigerlilies and fritillaries, but some others have very similar

Conclusion

In this paper, a novel approach for flowers recognition is proposed based on the attribute learning. Instead of training for the recognition of a specific category of flowers directly based on the manually designed feature sets, a series of visual attributes are extracted from a given set of flower images, which are then generalized to new images from possibly unknown category. To automate the attribute extraction from a given image, a generative dictionary is learned from the training set,

Acknowledgment

This research is supported by the National Science Foundation of China (NFSC) Nos. 61170126, 61203246, 61003183, 61373060 and the Science Foundation of Jiangsu Province No. BK2011521.

Keyang Cheng is a member of CCF. He received the M.S. degree from the School of Computer Science and Telecommunication Engineering of Jiangsu University, in 2008. Now he is currently a Ph.D. student at the Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics. He has co-authored more than 20 journal and conference papers. He is currently a researcher and teaching assistant in the School of Computer Science and Telecommunications Engineering of Jiangsu

References (41)

Z. Zha et al.
Graph based semi-supervised learning with multiple labels
J. Vis. Commun. Image Represent.
(2009)
M. Nilsback, A. Zisserman, Automated flower classification over a large number of classes, in: Proceedings of the Sixth...
D. Guru et al.
Textural features in flower classification
Math. Comput. Model. Math. Comput.
(2011)
M. Nilsback, A. Zisserman, A visual vocabulary for flower classification, in: The Proceedings of the IEEE Computer...
S. Takeshi, K. Toyohisa, Automatic recognition of wild flowers, in: The Proceedings of International Conference on...
M. Das et al.
Indexing flower patent images using domain knowledge
IEEE Intell. Syst.
(1999)
N. Shingo, S. Mie, A. Yoshimitsu, H. Shuji, Flower image database construction and its retrieval, in: The Proceedings...
T. Saitoh, K. Aoki, T. Kaneko, Automatic recognition of blooming flowers, in: The Proceedings of International...
E. Chang, B.T. Li, G. Wu, K. Goh, Statistical learning for effective visual information retrieval, in: The Proceedings...
Y. Freund et al.
An efficient boosting algorithm for combining preferences
J. Mach. Learn. Res.
(2003)

J. Zhang et al.

Local features and kernels for classification of texture and object categories: a comprehensive study

Int. J. Comput. Vis.

(2007)

T. Abe, T. Takada, H. Kawamura, T. Yasuno, N. Sonehara, Image-identification methods for camera-equipped mobile phones,...

M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge,...

G. Wang, D. Forsyth, Joint learning of visual attributes, object classes and visual saliency, in: The Proceedings of...

Y. Su, F. Jurie, Improving image classification using semantic attributes, J. Comput. Vis. 100(1) (2012)...

A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, in: The Proceedings of...

W. Kumar, A.C. Berg, P.N. Belhumeur, S.K. Nayar, Attribute and simile classifiers for face verification, in: The...

J. Vogel et al.

Semantic modeling of natural scenes for content-based image retrieval

Int. J. Comput. Vis.

(2007)

D.A. Vaquero, R.S. Feris, D. Tran, L. Brown, A. Hampapur, Attribute-based people search in surveillance environments,...

C.H. Lampert, H. Nickisch, S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer,...

Cited by (23)

Efficient deep features selections and classification for flower species recognition
2019, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
The paper [10] addressed the flower classification by applying neural network for logistic regression on flower image features. The paper [11] presented an attribute-based approach for flowers recognition. Visual attributes were extracted to describe flower images and generalized to new images on unknown flowers.
Image-based automatic flower species classification is an important problem for the biologists who construct digital flower catalogs. A dozen of work about flower species recognition has been proposed so far based on traditional image processing routines. Nowadays, researchers apply the deep learning on various image-based object recognition tasks. In this paper, deep convolutional neural networks (DCNN) based hybrid method is applied to flower species classification. The proposed method initially employs a pre-trained DCNN model for feature extraction. To this end, two popular DCNN architectures namely, AlexNet and VGG16 models are adopted. For constructing efficient feature sets, the features from AlexNet and VGG16 models are then concatenated. Finally, a feature selection algorithm, the minimum Redundancy Maximum Relevance (mRMR) method, is used to select the more efficient features. A support vector machine (SVM) classifier with Radial Bases Function (RBF) kernel is employed to classify the flower species using the extracted features. Flower17 and Flower102 datasets which have a huge amount of category are used in the experimental works. Various experiments results show that we have achieved 96.39% and 95.70% accuracy performance for Flower17 and Flower102, respectively. The obtained results demonstrate the effectiveness of the proposed method, despite the relative simplicity of the approach.
Separable vocabulary and feature fusion for image retrieval based on sparse representation
2017, Neurocomputing
Citation Excerpt :
Thus, it is not reliable to rely solely on the frequency histogram of BOW model. Recently, sparse representation [17–20] is widely used in image processing, which can reduce the quantization error and improve recall. The basic idea of sparse representation is seeking an optimal linear combination to approximate the original signal [21].
Visual vocabulary is the core of the Bag-of-visual-words (BOW) model in image retrieval. In order to ensure the retrieval accuracy, a large vocabulary is always used in traditional methods. However, a large vocabulary will lead to a low recall. In order to improve recall, vocabularies with medium sizes are proposed, but they will lead to a low accuracy. To address these two problems, we propose a new method for image retrieval based on feature fusion and sparse representation over separable vocabulary. Firstly, a large vocabulary is generated on the training dataset. Secondly, the vocabulary is separated into a number of vocabularies with medium sizes. Thirdly, for a given query image, we adopt sparse representation to select a vocabulary for retrieval. In the proposed method, the large vocabulary can guarantee a relatively high accuracy, while the vocabularies with medium sizes are responsible for high recall. Also, in order to reduce quantization error and improve recall, sparse representation scheme is used for visual words quantization. Moreover, both the local features and the global features are fused to improve the recall. Our proposed method is evaluated on two benchmark datasets, i.e., Coil20 and Holidays. Experiments show that our proposed method achieves good performance.
Sparse Multi-Modal Topical Coding for Image Annotation
2016, Neurocomputing
Citation Excerpt :
Sparsity not only can lead to compact and high-fidelity representations, but also can facilitate the extraction of the semantic information from images [18]. It has been applied to many computer vision tasks and brought about superior performance, such as image representation [19], image classification [20,21], image annotation [22,23] and face recognition [24,25,55]. PTMs such as correspondence LDA (cLDA) [11] and topic-regression multi-modal Latent Dirichlet Allocation (tr-mmLDA) [12] have limitations in learning compact correlations between image regions and words.
Image annotation plays a significant role in large scale image understanding, indexing and retrieval. The Probability Topic Models (PTMs) attempt to address this issue by learning latent representations of input samples, and have been shown to be effective by existing studies. Though useful, PTM has some limitations in interpreting the latent representations of images and texts, which if addressed would broaden its applicability. In this paper, we introduce sparsity to PTM to improve the interpretability of the inferred latent representations. Extending the Sparse Topical Coding that originally designed for unimodal documents learning, we propose a non-probabilistic formulation of PTM for automatic image annotation, namely Sparse Multi-Modal Topical Coding. Beyond controlling the sparsity, our model can capture more compact correlations between words and image regions. Empirical results on some benchmark datasets show that our model achieves better performance on automatic image annotation and text-based image retrieval over the baseline models.
A color constancy based flower classification method in the blockchain data lake
2024, Multimedia Tools and Applications
Cross-Modal Concept Learning and Inference for Vision-Language Models
2023, arXiv
A Performance Comparison of Classification Algorithms for Rose Plants
2022, Computational Intelligence and Neuroscience

View all citing articles on Scopus

Xiaoyang Tan received his B.Sc. and M.Sc. degrees in computer applications from Nanjing University of Aeronautics and Astronautics (NUAA), in 1993 and 1996, respectively. Then he worked at NUAA in June 1996 as an assistant lecturer. He received a Ph.D. degree from Department of Computer Science and Technology of Nanjing University, China, in 2005. From September 2006 to October 2007, he worked as a postdoctoral researcher in the LEAR (Learning and Recognition in Vision) team at INRIA Rhone-Alpesin Grenoble, France. His research interests are in face recognition, machine learning, pattern recognition, and computer vision. In these fields, he has authored or coauthored over 20 scientific papers.

View full text

Sparse representations based attribute learning for flower classification

Abstract

Introduction

Section snippets

The proposed method

Experimental setting and arrangement

Conclusion

Acknowledgment

J. Vis. Commun. Image Represent.

Textural features in flower classification

Math. Comput. Model. Math. Comput.

Indexing flower patent images using domain knowledge

IEEE Intell. Syst.

An efficient boosting algorithm for combining preferences

J. Mach. Learn. Res.

Local features and kernels for classification of texture and object categories: a comprehensive study

Int. J. Comput. Vis.

Semantic modeling of natural scenes for content-based image retrieval

Int. J. Comput. Vis.