Elsevier

Neurocomputing

Volume 99, 1 January 2013, Pages 298-306
Neurocomputing

Multi-instance multi-label image classification: A neural approach

https://doi.org/10.1016/j.neucom.2012.08.001Get rights and content

Abstract

In this paper, a multi-instance multi-label algorithm based on neural networks is proposed for image classification. The proposed algorithm, termed multi-instance multi-label neural network (MIMLNN), consists of two stages of MultiLayer Perceptrons (MLP). For multi-instance multi-label image classification, all the regional features are fed to the first-stage MLP, with one MLP copy processing one image region. After that, the MLP in the second stage incorporates the outputs of the first-stage MLPs to produce the final labels for the input image. The first-stage MLP is expected to model the relationship between regions and labels, while the second-stage MLP aims at capturing the label correlation for classification refinement. Error Back-Propagation (BP) approach is adopted to tune the parameters of MIMLNN. In view of that traditional gradient descent algorithm suffers from long-term dependency problem, a refined BP algorithm named Rprop is extended to effectively train MIMLNN. The experiments are conducted on a synthetic dataset and the Corel dataset. Experimental results demonstrate the superior performance of MIMLNN comparing with state-of-the-art algorithms for multi-instance multi-label image classification.

Introduction

With the prevalence of digital imaging devices such as digital cameras and mobile phones, a huge volume of images are produced every day, as a result of which image classification becomes increasingly important. Traditional image classification approaches classify an image to only one class, which may be impractical since in real applications an image could be associated with multiple labels. Furthermore, those traditional approaches usually treat an image as an instance and represent the image using global features only. The issue is that global features cannot well characterize the image contents, in particular when the images are composed of complex objects [1], [2]. In view of this, many multi-instance learning (MIL) [3], [4], [5] and multi-label learning (MLL) [6], [7], [8] algorithms are proposed to tackle image classification tasks.

In MIL, an image is a bag of instances, with each instance representing an image region. As informative regional features are utilized, this approach characterizes images with complex contents quite well. However, MIL targets at binary classification. It assigns an image with one label only. By contrast, MLL deals with multi-label tasks, and it is able to associate an image with multiple labels. But MLL regards an image as a single instance. The regional features are not employed. The issue is that the labels of an image usually relate to different regions. The use of global features is not powerful to discriminate the image labels. Therefore, some researchers have developed multi-instance multi-label learning (MIMLL) based on MIL and/or MLL for multi-label image classification [9], [10], [11].

For MIMLL image classification, each image is composed of a bag of regions and associated with multiple labels. In the learning phase, the relationship between the image regions and labels is unknown. The aim of learning is to figure out the relationship between the regions and labels from the training images, and then the learned relationship can be used to classify unlabeled images. Zhou et al. [9] proposed MIMLBoost and MIMLSVM for multi-instance multi-label scene classification. MIMLBoost transforms a MIML learning task into MIL problem and solve the MIL problem by using MIBoost [12], while MIMLSVM transforms a MIML learning task into MLL problem and tackle the MLL problem by adopting MLSVM [6]. The disadvantage of MIMLBoost and MIMLSVM is that both of them do not take label correlation into account.

Due to their powerful classification capability, neural networks have been widely applied to image classification and pattern recognition [13], [14], [15], [16], [17], [18], [19]. In this paper, we propose an approach based on neural networks for multi-instance multi-label image classification. The proposed approach is termed Multi-Instance Multi-Label Neural Network (MIMLNN). MIMLNN is composed of two stages of MultiLayer Perceptrons (MLP). The first-stage MLP receives the regional features of the input image, with one MLP copy handling one region. The second-stage MLP incorporates the outputs of the first-stage MLPs to generate the final labels for the input image. In the training phase, the first-stage MLP is expected to establish the relationship between the image regions and labels, and the second-stage MLP aims at capturing the label correlation for classification refinement. There are three reasons for us to choose neural networks for label correlation extraction. First, neural network models label correlation using nonlinear function, which is different from many other linear label correlation modeling methods. The nonlinear function is able to characterize more complex label co-occurrence relationship. Second, the label correlation information is stored in neural network weights. Based on the weights, it is possible to explicitly analyze the correlation values of different labels. Third, based on this work, in the future research, we would like to build a system that could feed label correlation values back to the input for performance enhancement. Neural networks are suitable for this task, as the outputs are easy to be fed back just by connecting them to the input. Error Back-Propagation (BP) algorithm is adopted to train MIMLNN. A traditional BP algorithm is gradient descent. Gradient descent algorithm updates neural weights according to the partial derivative magnitude of a predefined error function. Since the error back-propagates by multiplying the derivative of the sigmoid function, the value of which is between 0 and 1, the gradient magnitude could be very small for deep layers when the error back-propagates through multiple layers. As a result, the weights at very deep layers are difficult to update. This is called long-term dependency problem [20], [21]. To solve this problem, we extend the refined BP algorithm Rprop [22] for MIMLNN training. Rprop algorithm updates weights depending on the derivative sign rather than derivative magnitude of the error function. Therefore, the problem of long-term dependency can be solved. To demonstrate the superior performance of MIMLNN, the experiments are conducted on a synthetic image dataset and the popular Corel image dataset. One main advantage of synthetic dataset is that the contents of synthetic images are controllable. For example, we can design specific label correlation for our research.

The rest of this paper is organized as follows. Section 2 describes the architecture and training algorithms of the proposed MIMLNN. The long-term dependency problem is also discussed. The experimental results on the synthetic dataset and Corel dataset are reported in Section 3. Section 4 concludes this paper with final remarks. Before ending this introductory section, it is worth mentioning the contributions of this paper as follows:

  • 1.

    We propose a unique structure based on classical MLPs for image classification. We provide a neural network solution for the multi-instance multi-label learning problem.

  • 2.

    We find out an effective way to train the unique network. We extend a refined BP algorithm for the network training.

  • 3.

    We demonstrate the superior performance of the proposed neural structure by comparing with state-of-the-art algorithms.

Section snippets

Multi-instance multi-label neural network (MIMLNN)

Fig. 1 shows the image classification flowchart of MIMLNN. MIMLNN consists of two stages of MLPs. The first-stage MLP is named MLP1, and the second-stage MLP is denoted as MLP2. Given an image, the classification is conducted by MIMLNN as follows. First of all, the image is divided into a number of regions by performing automatic segmentation or regular gridding. After that, the features extracted from the regions are fed to MLP1s, with one MLP copy processing one region's features. Finally,

Experimental results on synthetic dataset

The first dataset used to evaluate the performance of MIMLNN is a synthetic image dataset proposed in [2]. The synthetic images are composed of nine objects: “round”, “triangle”, “rectangle”, “octagon”, “4-point star”, “5-point star”, “moon”, “heart”, and “lighting bolt”. Each object has six colors: red, green, blue, yellow, cyan, and magenta. Each synthetic image contains 1–4 non-overlapping objects and a white background. In order to make some label correlations, objects “round” and

Conclusions

In this paper, we propose a neural approach MIMLNN for multi-instance multi-label image classification. MIMLNN consists of two stages of MLPs. The first-stage MLP is used to establish the relationship between image regions and labels. The second-stage MLP aims at capturing label correlation for classification refinement. To solve the long-term dependency problem encountered in traditional gradient descent algorithm, a refined back-propagation algorithm Rprop is extended to train MIMLNN. The

Acknowledgments

The work reported in this paper was supported by a research grant from The Hong Kong Polytechnic University (Project Code: 4-ZZ7V). The work of Z. Chen was supported by The Hong Kong Polytechnic University for his Ph.D. study.

Zenghai Chen received his B.Eng. in Department of Electronics and Communication Engineering from Sun Yat-set University, Guangzhou, China, in 2009. Since 2009, he has been a Ph.D. candidate in Department of Electronic and Information Engineering, the Hong Kong Polytechnic University, Hong Kong. His research interests include pattern recognition and computational intelligence.

References (32)

  • Z. Chen, H. Fu, Z. Chi, D. Feng, An adaptive recognition model for image annotation, IEEE Transactions on System, Man,...
  • Y. Chen et al.

    Miles: multiple-instance learning via embedded instance selection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • Y. Chen et al.

    Image categorization by learning and reasoning with regions

    J. Mach. Learn. Res.

    (2004)
  • F. Kang, R. Jin, R. Sukthankar, Correlated label propagation with application to multi-label learning, in: Proceedings...
  • Z.H. Zhou et al.

    Multi-instance multi-label learning with application to scene classification

    Adv. Neural Inf. Process. Syst.

    (2007)
  • Z.J. Zha, X.S. Hua, T. Mei, J. Wang, G.J. Qi, Z. Wang, Joint multi-label multi-instance learning for image...
  • Cited by (61)

    • The classification of construction waste material using a deep convolutional neural network

      2021, Automation in Construction
      Citation Excerpt :

      Several CNN architectures have been proposed for multilabel classification in different aspects of applications. Zenghai Chen et al. [13], proposed a Multi-Label Neural Network consisting of two stages of MultiLayer Perceptrons. Yunchao Wei et al. [12,14], proposed hypotheses CNN-Pooling.

    • Regularizing extreme learning machine by dual locally linear embedding manifold learning for training multi-label neural network classifiers

      2021, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      A major cause for this result is that the representation received for the autoencoder can be more accurate than the previous one. In recent years, the use of multi-label learning approaches has been particularly prevalent in a number of real-world subjects, such as text classification, image classification and bioinformatics (Boutell et al., 2004; Chen et al., 2013; Lu and Weng, 2007; Druzhkov and Kustikova, 2016). For instance, the image classification (IC) problem, as one of the most fundamental issues in machine learning, is based on the design of a proper procedure for assigning a set of labels to images by considering a collection of predetermined classes or subjects.

    View all citing articles on Scopus

    Zenghai Chen received his B.Eng. in Department of Electronics and Communication Engineering from Sun Yat-set University, Guangzhou, China, in 2009. Since 2009, he has been a Ph.D. candidate in Department of Electronic and Information Engineering, the Hong Kong Polytechnic University, Hong Kong. His research interests include pattern recognition and computational intelligence.

    Zheru Chi received his B.Eng. and M.Eng. degrees from Zhejiang University in 1982 and 1985, respectively, and Ph.D. degree from the University of Sydney in March 1994. Between 1985 and 1989, he was a Faculty of the Department of Scientific Instruments at Zhejiang University. He worked as a Senior Research Assistant/Research Fellow in the Laboratory for Imaging Science and Engineering at the University of Sydney from April 1993 to January 1995. In February 1995, he joined The Hong Kong Polytechnic University, where he is now an Associate Professor in the Department of Electronic and Information Engineering. In 1996, Dr. Chi co-authored a research monograph “Fuzzy Algorithms with Applications to Image Processing and Pattern Recognition”. He was also a contributor to Comprehensive Dictionary of Electrical Engineering (CRC Press and IEEE Press, 1999). Since 1997, he has served as Co-organizer of a Special Session/Session Chair/Area Moderator/Program Committee Member for a number of international conferences. He was an Associate Editor for IEEE Transactions on Fuzzy Systems between 2008 and 2010 and a Technical Editor for International Journal of Information Acquisition since the journal was launched in 2004. For the past two decades, he has reviewed a great number of papers for various prestigious international journals and conferences. Dr. Chi's research interests include image processing, pattern recognition, and computational intelligence techniques. Dr. Chi has published more than 190 technical papers since 1990.

    Hong Fu received her Bachelor and Master degrees from Xi'an Jiaotong University in 2000 and 2003, and Ph.D. degree from the Hong Kong Polytechnic University in 2007. She is now an Assistant Professor in Department of Computer Science, Chu Hai College of Higher Education, Hong Kong. Her research interests include image processing, pattern recognition, and artificial intelligence.

    Dagan Feng received his M.E. in Electrical Engineering & Computer Science (EECS) from Shanghai Jiao Tong University in 1982, M.Sc. in Biocybernetics and Ph.D. in Computer Science from the University of California, Los Angeles (UCLA) in 1985 and 1988 respectively, where he received the Crump Prize for Excellence in Medical Engineering. He is currently Head of School of Information Technologies and Director of the Institute of Biomedical Engineering and Technology at the University of Sydney, as well as Guest Professor of a number of Universities and Chair Professor of Hong Kong Polytechnic University. He has published over 500 scholarly research papers, pioneered several new research directions, and made a number of landmark contributions in his field. Prof. Feng is a Fellow of ACS, HKIE, IET, IEEE and Australian Academy of Technological Sciences and Engineering.

    View full text