Manifold optimal experimental design via dependence maximization for active learning
Introduction
During the past decades, hundreds of thousands of data have emerged in an extensive range of fields and have been applied to numerous real-world tasks. Nevertheless, the majority of them has no access to labels, which require heavy loads and costly expert knowledge. In this regard, it becomes a crucial demand to select a much smaller subset of points characterizing the most information from the data collection. In the machine learning community, it is treated as an active learning problem [7], [23], which has received lots of interests from both academia and industry. For example, the merits of active learning in multimedia annotation, image retrieval [28] and video indexing [31] have been empirically demonstrated.
To this end, the popular principles adopted in active learning include uncertainty sampling, query by committee, error reduction and variance reduction. Typically, the uncertainty sampling rule has been applied to support vector machine (SVM) [27], nearest neighbor classifier [19], etc. With this method, the most uncertain samples are queried for labeling. The variance reduction criterion originates from optimal experimental design (OED) [1], which refers to the problem of selecting samples to label in statistics. In the experimental design, the sample and its label are respectively seen as experiment and measurement. OED aims to minimize variances of a parameterized model, e.g., minimizing the variance of the model parameters leads to A-, D- and E-optimal design while minimizing the variance of the estimated value leads to I- and G-optimal design [1]. However, these methods belong to the supervised paradigm, which does not consider the unmeasured (i.e., unlabeled) samples. To overcome this drawback, some methods utilize both measured and unmeasured samples to actively select the most informative points, e.g., Transductive Experimental Design (TED) [30] evaluates the average prediction variance on the pre-given unseen data based on I-optimal design. Nevertheless, TED does not consider the local manifold structure of the data space, which is of vital importance in active learning, since naturally occurring data often reside on a lower dimensional sub-manifold of the ambient Euclidean space [3], [16], [18]. To handle this deficit, Laplacian regularized D-optimal design (LapRDD) [13] was proposed, where the loss function is defined on both labeled and unlabeled points with an imposed locality preserving regularizer, which has been adopted in several learning methods to improve the performance [15], [17]. Overall, the above methods do not fully consider the correlation between the unlabeled data and their estimated predictions, i.e., existing models only well respect the labeled data whereas the dependence between the unlabeled data and their predictions has not been explored in the context of OED.
Recently, the Hilbert–Schmidt independence criterion (HSIC) [11], which measures the dependence between two random variables, has been successfully applied to many real-world applications, such as feature selection [24], dimensionality reduction [34], classification [2] and clustering [25]. These methods take advantage of HSIC to maximize the dependence between the input data (e.g., the feature) and the output (e.g., the label), leading to the improved performance. However, they do not consider the local geometry that reflects the intrinsic structure of the data space, which is able to refine the learning performance. Essentially, HSIC is an empirical estimate of the Hilbert–Schmidt norm of the cross-covariance operator and has several advantages. First of all, it has a simple formulation as the trace of the product of Gram matrices. Besides, its rate of converging to the population estimate is conversely proportional to the square root of the number of samples. In addition, if the sample size is large, any existing dependence between the random variables is guaranteed to be revealed with a high probability [11]. Naturally, these merits can be sufficiently employed in OED to better model the correlation between unlabeled data and their estimated predictions. But such a kind of the correlation cannot be revealed by the existing LapRDD, which uses the linear regression to model the relation only between the labeled data and their labels.
Motivated by this, we adopt HSIC in Laplacian regularized OED to improve the performance of active learning. In this way, we propose a novel active learning method named manifold optimal experimental design via dependence maximization (MODM). The central idea is to take advantage of HSIC to measure the dependence between the input data and their estimated outputs. Particularly, in virtue of the regression model, we maximize the inherent dependence between the feature vectors and the corresponding predictions under the OED framework. In some sense, the dependence maximization reflects the relation between the inputs and the outputs globally. Furthermore, since MODM is developed upon LapRDD, thus inheriting the locality preserving property by the graph Laplacian. On the whole, a significant merit of MODM is that both the dependence maximization and the locally geometrical structure of the data are well respected in a unified model. This way, the most informative data points can be better selected for labeling, thus yielding an improved model. To investigate the performance of MODM, we apply it to a natural application of active learning, i.e., relevance feedback in image retrieval [22], [35]. Empirical studies have demonstrated the superiority of the proposed method compared to other alternatives.
It is worthwhile to highlight the main contributions of this work as follows:
- •
A novel active learning named MODM is presented by incorporating HSIC regularizer to manifold optimal experimental design, which enables maximizing the dependence between the unlabeled samples and their estimations. Thus, not only the relations between labeled samples and their labels are considered by a linear regression model, but also the dependence between unlabeled samples and their estimations, in addition to the manifold structure of the data space, is together respected for OED in a unified framework.
- •
Detailed derivations of the proposed method including a sequential optimization method are given with the time complexity analysis. Moreover, we generalize it to the nonlinear situation, i.e., in Reproducing Kernel Hilbert Spaces (RKHS), thus being performed for linearly nonseparable data points.
- •
We have applied our method to content-based image retrieval (CBIR) on two real-world databases to investigate its performance. Experimental results have demonstrated the superiority of the proposed approach in terms of several evaluation metrics.
The remainder of this paper is organized as follows. We briefly review the related works in Section 2. Section 3 introduces the proposed manifold optimal experimental design via a dependence maximization algorithm as well as its nonlinear extension described in Section 4. Section 5 reports comprehensively the experimental results and finally we reach a conclusion in Section 6.
Section snippets
Related works
In this work, we focus on active learning, which is a hot topic in the machine learning community [8], [12], [23], [32]. The relationship between active learning and semi-supervised learning [5], [37] is just like one coin has two sides. They share the goal of relieving the boring tasks of labeling the unlabeled data. Here, we discuss active learning under the framework of OED. The problem setting can be stated as follows. Given a data collection with n samples in , i.e., , we aim
Manifold OED via dependence maximization
In this section, we introduce the proposed MODM approach. It is fundamentally encouraged by both the successes of HSIC in many practical applications [34], [2], [24] and the great popularity of Laplacian regularized least squares (LapRLS) [3] in OED [33], [6], [13].
Extensions to nonlinear case
For many real-world tasks, nonlinear mappings are often utilized to handle the data points which are linearly inseparable. Thus, in this section, we discuss the nonlinear extension of our method by performing experimental design in Reproducing Kernel Hilbert Space (RKHS).
Consider a RKHS such that . Suppose that the selected point is , and it can be mapped to . Given two data points and , the inner product constructs a kernel . The data matrices and
Experiments
To investigate the performance of the proposed MODM algorithm, we apply it to content-based image retrieval (CBIR), which has been extensively examined in experimental design [33], [6].
Conclusion
In this paper, we have presented a novel active learning algorithm named manifold optimal experimental design via dependence maximization (MODM). Our algorithm is developed upon the Laplacian regularized least squares (LapRLS). In particular, we employ the Hilbert–Schmidt independence criterion (HSIC) to maximize the dependence between the input data points and their estimated predictions in the regression model, so as to further enhance and consolidate their inherent relationships. Moreover,
Acknowledgments
This work was supported in part by National Natural Science Foundation of China under Grants 91120302, 61222207, 61173185 and 61173186, National Basic Research Program of China (973 Program) under Grant 2013CB336500, the Fundamental Research Funds for the Central Universities under Grant 2012FZA5017 and the Zhejiang Provincial Natural Science Foundation of China under Grant LZ13F020001.
Ping Li received the Ph.D. degree in Computer Science from Zhejiang University, China, in 2014. Prior to that, he received the M.S. degree from Central South University, China, in 2010. He is now an Assistant Professor in the School of Computer Science and Technology, Hangzhou Dianzi University. His research interests include machine learning, data mining and multimedia content analysis.
References (37)
- et al.
Supervised principal component analysisvisualization, classification and regression on subspaces and submanifolds
Pattern Recognit.
(2011) - et al.
Active learning for social image retrieval using locally regressive optimal design
Neurocomputing
(2012) - et al.
Active learning based intervertebral disk classification combining shape and texture similarities
Neurocomputing
(2013) - et al.
Subspace learning via locally constrained A-optimal nonnegative projection
Neurocomputing
(2013) - et al.
Locally discriminative spectral clustering with composite manifold
Neurocomputing
(2013) - et al.
Discriminative orthogonal nonnegative matrix factorization with flexibility for data representation
Expert Syst. Appl.
(2014) - et al.
Hessian optimal design for image retrieval
Pattern Recognit.
(2011) - et al.
Multilabel dimensionality reduction via dependence maximization
ACM Trans. Knowl. Discov. Data
(2010) - et al.
Optimum Experimental Designs with SAS
(2007) - et al.
Manifold regularizationa geometric framework for learning from labeled and unlabeled examples
J. Mach. Learn. Res.
(2006)
Manifold adaptive experimental design for text categorization
IEEE Trans. Knowl. Data Eng.
Active learning with statistical models
J. Artif. Intell. Res.
Laplacian regularized d-optimal design for active learning and its application to image retrieval
IEEE Trans. Image Process.
Cited by (9)
Local variational Probabilistic Minimax Active Learning
2023, Expert Systems with ApplicationsLearning with Hilbert–Schmidt independence criterion: A review and new perspectives
2021, Knowledge-Based SystemsFunctional gradient approach to probabilistic minimax active learning
2019, Engineering Applications of Artificial IntelligenceCitation Excerpt :Representativeness methods usually consider the structure of data for the selection of examples. This structure can be either in the form of clusters or of manifolds in space (Dasgupta and Hsu, 2008; Wang et al., 2017; Chang and Liao, 2017; Li et al., 2014; Hu et al., 2009; Patra and Bruzzone, 2012). Authors in Dasgupta and Hsu (2008) propose a method which selects instances through clustering.
Active learning based on minimization of the expected path-length of random walks on the learned manifold structure
2017, Pattern RecognitionCitation Excerpt :In [1], active learning is based on manifold-regularized D-optimal experimental design, and query samples are selected by minimization of the variance of a Laplacian regularized regression model. In [6], based on manifold-regularized D-optimal experimental design, the Hilbert-Schmidt independence criterion is also considered to strengthen the dependence between sample points and their predictions. In contrast to classic experimental design, which only evaluates the expected prediction error on selected samples, transductive experimental design (TED) [7] also takes into account the expected prediction error on unselected samples.
Active learning for penalized logistic regression via sequential experimental design
2017, NeurocomputingCitation Excerpt :Le Ly and Lipson [21] presented an active-learning method, based on Shannon information criterion, and used it for model disambiguation. Similarly, Li et al. [22] introduced manifold optimal experimental design via dependence maximization, selecting subjects that minimized the variance of the model parameters. Instead of focusing on variance, Pauwels et al. [23] proposed a criterion that attempted to control the contribution of both variance and bias jointly to the error of parameter estimation.
An online generalized eigenvalue version of Laplacian Eigenmaps for visual big data
2016, NeurocomputingCitation Excerpt :Multiple discriminative features from the face image patches are extracted by a multi-manifold based subspace learning criteria; which significantly improved the recognition performance in comparison to other techniques. Similarly in [12] the authors have presented a novel active learning algorithm named manifold optimal experimental design via dependence maximization (MODM) based on Laplacian regularized least squares. The objective function of the proposed learning algorithm is efficiently optimized by using the sequential optimization strategy and the results showed the effectiveness of the proposed method in comparison with other standard approaches.
Ping Li received the Ph.D. degree in Computer Science from Zhejiang University, China, in 2014. Prior to that, he received the M.S. degree from Central South University, China, in 2010. He is now an Assistant Professor in the School of Computer Science and Technology, Hangzhou Dianzi University. His research interests include machine learning, data mining and multimedia content analysis.
Jiajun Bu received the B.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1995 and 2000, respectively. He is a Professor in the College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.
Chun Chen received the B.S. degree in Mathematics from Xiamen University, China, in 1981, and his M.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1984 and 1990, respectively. He is a Professor in the College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.
Deng Cai received the Ph.D. degree in Computer Science from the University of Illinois at Urbana Champaign in 2009. Before that, he received the B.S. and M.S. degrees from Tsinghua University, China, in 2000 and 2003, respectively, both in Automation. He is a Professor in the State Key Lab. of CAD&CG, College of Computer Science at Zhejiang University, China. His research interests include machine learning, data mining and computer vision.