A New Method for Structured Learning with Privileged Information

Sun, Shiding; Zhang, Chunhua; Tian, Yingjie

doi:10.1007/978-3-319-93701-4_35

A New Method for Structured Learning with Privileged Information

Shiding Sun²⁰,
Chunhua Zhang²⁰ &
Yingjie Tian²¹

Conference paper
First Online: 12 June 2018

2583 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10861))

Abstract

In this paper, we present a new method JKSE+ for structured learning. Compared with some classical methods such as SSVM and CRFs, the optimization problem in JKSE+ is a convex quadratical problem and can be easily solved because it is based on JKSE. By incorporating the privileged information into JKSE, the performance of JKSE+ is improved. We apply JKSE+ to the problem of object detection, which is a typical one in structured learning. Some experimental results show that JKSE+ performs better than JKSE.

C. Zhang—This work has been partially supported by grants from National Natural Science Foundation of China (Nos. 61472390, 71731009, 71331005, 91546201 and 11771038), and the Beijing Natural Science Foundation (No. 1162005).

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

This paper deals with the structured learning problems which learn function: $f:\mathcal{X} \rightarrow \mathcal{Y}$, where the elements of $\mathcal{X}$ and $\mathcal{Y}$ are structured objects such as sequences, trees, bounding boxes, strings. Structured learning arises in lots of real world applications including multi-label classification, natural language parsing, object detection, and so on. Conditional random fields [5, 6], maximum margin markov networks [9] and structured output support vector machines (SSVM) [10] have been developed as powerful tools to predict the structured data. The common approach of these methods is to define a linear scoring function based on a joint feature map over inputs and outputs. There are some drawbacks in these methods. On the one hand, to apply them one requires clearly labeled training sets. Experiments show that some incorrect or incomplete labels can reduce their performance. On the other hand, training these models is computationally cost. So it is difficult or infeasible to solve large scale problems except for some special output structures.

To overcome these drawbacks, a method called Joint Kernel Support Estimation (JKSE) has been proposed in [7]. JKSE is a generative method as it relies on learning the support of the joint-probability density of inputs and outputs. This makes it robust in handling mislabeled data. At the same time, The optimization problem is convex and can be efficiently solved because the one-class SVM is used in it. However, JKSE is not as powerful as SSVM [2]. So we focus on the following problem: How to improve the performance of JKSE? To answer this question, we introduce the privileged information into JKSE.

Privileged information [11] provides useful high-level knowledge that is used only at training time. For example, in the problem of object detection, these information includes the object’s parts, attributes and segmentations. More reliable models [3, 4, 8, 11] can be learned by incorporating these high-level information into SVM, SSVM, one-class SVM.

In this paper, we propose a new method called JKSE+ based on JKSE with privileged information and apply it to the problem of object detection. Some experiments show that our new method JKSE+ performs better than JKSE.

The rest of this paper is organized as follows. We first review the method JKSE in Sect. 2, then introduce our new method JKSE+ in Sect. 3, and the experimental results are presented in Sect. 4.

2 Related Work

This section considers the following structured learning problem: given the training set: $\left\{ {\left( {{x_{1,}}\ {y_1}}\right) \!,\ ...,\ \left( {{x_l},\ {y_l}} \right) } \right\} $, where ${x_i}\in \mathcal{X}$, ${y_i} \in \mathcal{Y}$. $\mathcal{X}$ and $\mathcal{Y}$ are the space of inputs and outputs with some structures respectively. Assume that the input-output pairs $\left( {x,y} \right) $ follow a joint probability distribution $p\left( {x,y} \right) $. Our goal is to learn a mapping: $g:\mathcal{X} \rightarrow \mathcal{Y}$ such that for a new input ${x} \in \mathcal{X}$, the corresponding label ${y} \in \mathcal{Y}$ can be determined by maximizes the posterior probability $p\left( {y|x} \right) $.

As we all know, The discriminative method directly models the conditional distribution $p\left( {y|x} \right) $, and the generative method directly models the joint distribution $p\left( {x,y} \right) $. These two methods are equivalent, i.e. $\mathop {\arg \max }\limits _{y \in \mathcal{Y}} p\left( {y|x} \right) = \mathop {\arg \max }\limits _{y \in \mathcal{Y}} p\left( {x,y} \right) {{for\; any }} \ x \in \mathcal{X}$. JKSE is a generative method. Suppose that $p\left( {x,y} \right) = \frac{1}{Z}\exp \left( {\left\langle {w,\varPhi \left( {x,y} \right) } \right\rangle } \right) $. Here, $Z \equiv \sum \nolimits _{x,y} {\exp \left( {\left\langle {w,\varPhi \left( {x,y} \right) } \right\rangle } \right) }$, and Z is a normalization constant. We can ignore Z during training and testing. The JKSE method translates the task of learning a joint probability distribution $p\left( {x,y} \right) $ into a one-class SVM problem to estimate the joint probability distribution $p\left( {x,y} \right) $.

In training phase, JKSE solves the following problem:

$$\begin{aligned} \begin{array}{*{20}{l}} {\mathop {\min }\limits _{w,\xi ,\rho } \frac{1}{2}\parallel w{\parallel ^2} + \frac{1}{{vl}}\sum \limits _{i = 1}^l {{\xi _i} - \rho } }\\ \begin{array}{l} s.t. \; \left\langle {w,\varPhi \left( {{x_i},{y_i}} \right) } \right\rangle \ge \rho - {\xi _i},\quad i = 1,2,...,l,\quad \\ \qquad {\xi _i} \ge 0, \quad i = 1,2,...,l. \end{array} \end{array} \end{aligned}$$

(1)

To get its solution, JKSE solve its dual problem:

$$\begin{aligned} \begin{array}{*{20}{l}} {\mathop {\min }\limits _\alpha \sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {{\alpha _i}{\alpha _j}K\left( {\left( {{x_i},{y_i}} \right) ,\left( {{x_j},{y_j}} \right) } \right) } } }\\ \begin{array}{l} s.t.\quad \mathrm{{0}} \le {\alpha _i} \le \frac{1}{{vl}}, \quad i = 1,...,l,\\ \qquad \sum \limits _{i = 1}^l {{\alpha _i} = 1.} \end{array} \end{array} \end{aligned}$$

(2)

where $K\left( {\left( {x,y} \right) ,\left( {x',y'} \right) } \right) \equiv \left\langle {\varPhi \left( {x,y} \right) ,\varPhi \left( {x',y'} \right) } \right\rangle $ is a joint feature kernel function. If ${\alpha ^*}$ is the solution to the above problem (2), then the solution to the primal problem (1) for w is given as follows:

$$\begin{aligned} {w^*} = \sum \limits _{i = 1}^l {{\alpha _i ^*}\varPhi \left( {{x_i},{y_i}} \right) }. \end{aligned}$$

(3)

Furthermore, in the inference step, for a new input ${x} \in \mathcal{X}$, the corresponding label y is given by:

$$\begin{aligned} y = \mathop {\arg \max }\limits _{y \in \mathcal {Y}} \sum \limits _{i = 1}^l {{\alpha _i}K\left( {\left( {{x_i},{y_i}}\right) \!,\left( {x,y} \right) } \right) }. \end{aligned}$$

(4)

3 JKSE+

Assume that we have some privileged information, $\left( {x_1^*,x_2^*,...,x_l^*} \right) \in \mathcal{X^*}$ that is available only at the training phase but not available on the test phase. Now we consider the following privileged structured learning problem:

Given a training set $T = \left\{ {\left( {{x_1},x_1^*,{y_1}} \right) ,...,\left( {{x_l},x_l^*,{y_l}} \right) } \right\} $ where ${x_i} \in \mathcal {X}$, ${x_i^*} \in \mathcal{X^*}$, ${y} \in \mathcal {Y}$, $i = 1,...,l$, our goal is to find a mapping: $g:x \rightarrow y$, such that the label of y for any x can be predicted by $y = g\left( x \right) $.

Now we discuss how the privileged information can be incorporated into the framework of JKSE. Suppose that there exists the best but unknown function: $\mathop {\arg \max }\limits _{y \in \mathcal Y} \left\langle {{w_0},\varPhi \left( {x,y} \right) } \right\rangle $. The function $\xi \left( x \right) $ of the input x is defined as follows:

$$\begin{aligned} {\xi ^0} = \xi \left( x \right) = {\left[ {\rho - \left\langle {{w_0},\varPhi \left( {x,y} \right) } \right\rangle } \right] _ + } \end{aligned}$$

where ${\left[ \eta \right] _ + } = \left\{ {\begin{array}{*{20}{c}} {\eta ,\quad if \quad \eta \ge 0,}\\ {0,\quad otherwise.} \end{array}} \right. $ If we know the value of the function $\xi \left( x \right) $ on each input ${x_i}\left( {i = 1,...,l} \right) $ such as we know the triplets $\left( {{x_i},\xi _i^0,{y_i}} \right) $ with $\xi _i^0 = \xi \left( {{x_i}} \right) ,i = 1,...,l$, we can get improved prediction. However, in reality, this is impossible. Instead we use a correcting function to approximate the function $\xi \left( x \right) $. Similar to one-class SVM with privileged information in [3], we replace ${\xi _i}$ by a mixture of values of the correcting function $\psi \left( {x_i^*} \right) = \left\langle {{w^*},\varPhi \left( {x_i^*,y_i} \right) } \right\rangle + {b^*}$ and some values ${\zeta _i}$, and get the primal problem of JKSE+:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{w,\mathrm{{ }}{w^*}\mathrm{{, }}{b^*}\mathrm{{, }}\rho \mathrm{{,}}\zeta } \frac{{vl}}{2}\parallel w{\parallel ^2} + \frac{\gamma }{2}\parallel {w^*}{\parallel ^2} - vl\rho + \sum \limits _{i = 1}^l {\left[ {\left\langle {{w^*},{\varPhi ^*}\left( {{x_i},{y_i}} \right) } \right\rangle + {b^*} + {\zeta _i}} \right] } \\ s.t. \quad \left\langle {w,\varPhi \left( {{x_i},{y_i}} \right) } \right\rangle \ge \rho - \left( {\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*}} \right) , \quad i=1,...,l,\\ \qquad \;\,\mathrm{{ }}\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*} + {\zeta _i} \ge 0,\mathrm{{ }}{\zeta _i} \ge 0,\quad i=1,...,l. \end{array} \end{aligned}$$

(5)

The Lagrange function for this problem is:

$$\begin{aligned}&L\left( {w,{w^*},{b^*},\rho ,\zeta ,\mu ,\alpha ,\beta } \right) = \frac{{vl}}{2}\parallel w{\parallel ^2} + \frac{\gamma }{2}\parallel {w^*}{\parallel ^2} - vl\rho \nonumber \\&+ \sum \limits _{i = 1}^l {\left[ {\left\langle {{w^*},{\varPhi ^*}\left( {{x_i},{y_i}} \right) } \right\rangle + {b^*} + {\zeta _i}} \right] } \nonumber \\&{ - \sum \limits _{i = 1}^l {{\mu _i}{\zeta _i}} - \sum \limits _{i = 1}^l {{\alpha _i}\left[ {\left\langle {w,\varPhi \left( {{x_i},{y_i}} \right) } \right\rangle - \rho + \left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*}} \right] } }\nonumber \\&{ - \sum \limits _{i = 1}^l {{\beta _i}\left[ {\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*} + {\zeta _i}} \right] } } \end{aligned}$$

(6)

The KKT conditions are as follows:

$$\begin{aligned} {\nabla _w}L = vlw - \sum \limits _{i = 1}^l {{\alpha _i}\varPhi \left( {{x_i},{y_i}} \right) = 0},\qquad \qquad \qquad \quad \,\,\end{aligned}$$

(7)

$$\begin{aligned} {{\nabla _{{w^*}}}L = \gamma {w^*} + \sum \limits _{i = 1}^l {{\varPhi ^*}\left( {x_i^*,{y_i}} \right) - \sum \limits _{i = 1}^l {{\alpha _i}{\varPhi ^*}\left( {x_i^*,{y_i}} \right) - \sum \limits _{i = 1}^l {{\beta _i}{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } } } },\end{aligned}$$

(8)

$$\begin{aligned} \frac{{\partial L}}{{\partial {b^*}}} = l - \sum \limits _{i = 1}^l {{\alpha _i} - \sum \limits _{i = 1}^l {{\beta _i} = 0} },\qquad \qquad \qquad \quad \quad \end{aligned}$$

(9)

$$\begin{aligned} \frac{{\partial L}}{{\partial \rho }} = - vl + \sum \limits _{i = 1}^l {{\alpha _i}} = 0,\qquad \qquad \qquad \qquad \quad \,\,\end{aligned}$$

(10)

$$\begin{aligned} \frac{{\partial L}}{{\partial {\zeta _i}}} = 1 - {\beta _i} - {\mu _i} = 0, i=1,...,l,\qquad \qquad \qquad \quad \,\,\,\end{aligned}$$

(11)

$$\begin{aligned} \rho - \left( {\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*}} \right) - \left\langle {w,\varPhi \left( {{x_i},{y_i}} \right) } \right\rangle \le 0, i=1,...,l,\,\,\qquad \end{aligned}$$

(12)

$$\begin{aligned} - \left( {\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*} + {\zeta _i}} \right) \le 0, i = 1,...,l,\qquad \qquad \quad \,\,\,\end{aligned}$$

(13)

$$\begin{aligned} - {\zeta _i} \le 0,i = 1,...,l,\qquad \qquad \qquad \qquad \quad \quad \quad \end{aligned}$$

(14)

$$\begin{aligned} {\alpha _i}\left[ {\rho - \left( {\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*}} \right) - \left\langle {w,\varPhi \left( {{x_i},{y_i}} \right) } \right\rangle } \right] = 0, i = 1,...,l,\quad \,\,\,\,\,\end{aligned}$$

(15)

$$\begin{aligned} {\beta _i}\left[ {\left\langle {{w^*},{\varPhi ^*}\left( {x_i^*,{y_i}} \right) } \right\rangle + {b^*} + {\zeta _i}} \right] = 0, i = 1,...,l,\qquad \qquad \quad \end{aligned}$$

(16)

$$\begin{aligned} {\mu _i}{\zeta _i} = 0, i = 1,...,l,\qquad \qquad \qquad \qquad \qquad \,\,\,\end{aligned}$$

(17)

$$\begin{aligned} {\alpha _i} \ge 0,{\beta _i} \ge 0,{\mu _i} \ge 0, i = 1,...,l.\qquad \qquad \qquad \quad \,\,\,\, \end{aligned}$$

(18)

From the above KKT conditions and setting ${\delta _i} = 1 - {\beta _i}$ , we can get that

$$\begin{aligned} w = \frac{1}{{vl}}\sum \limits _{i = 1}^l {{\alpha _i}\varPhi \left( {{x_i},{y_i}} \right) } , \qquad \; \end{aligned}$$

(19)

$$\begin{aligned} {w^*} = \frac{1}{\gamma }\sum \limits _{i = 1}^l {\left( {{\alpha _i} - {\delta _i}} \right) {\varPhi ^*}\left( {x_i^*,{y_i}} \right) },\,\,\end{aligned}$$

(20)

$$\begin{aligned} \sum \limits _{i = 1}^l {{\delta _i} = \sum \limits _{i = 1}^l {{\alpha _i} = vl} }, \qquad \,\,\, \end{aligned}$$

(21)

$$\begin{aligned} \mathrm{{0}} \le {\delta _i} \le 1, i = 1,...,l. \qquad \,\; \end{aligned}$$

(22)

So, we can get the dual problem is as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\max }\limits _{\alpha ,\delta } - \frac{1}{{2vl}}\sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {{\alpha _i}{\alpha _j}} K\left( {\left( {{x_i},{y_i}} \right) ,\left( {{x_j},{y_j}} \right) } \right) } \\ \qquad - \sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {\frac{1}{{2\gamma }}\left( {{\alpha _i} - {\delta _i}} \right) {K^*}\left( {\left( {x_i^*,{y_i}} \right) ,\left( {x_j^*,{y_j}} \right) } \right) \left( {{\alpha _j} - {\delta _j}} \right) } } \\ s.t.\quad \mathrm{{ }}\sum \limits _{i = 1}^l {{\alpha _i} = vl,\quad \mathrm{{ }}{\alpha _i} \ge 0},\\ \qquad \; \mathrm{{ }}\sum \limits _{i = 1}^l {{\delta _i}} = vl, \quad \mathrm{{ 0}} \le {\delta _i} \le 1. \end{array} \end{aligned}$$

(23)

We use ${K\left( {\left( {{x_i},{y_i}} \right) \!,\left( {{x_j},{y_j}} \right) } \right) }$ and ${{K^*}\left( {\left( {x_i^*,{y_i}} \right) \!,\left( {x_j^*,{y_j}} \right) } \right) }$ to replace the inner product $\left\langle {\varPhi \left( {{x_i},{y_i}} \right) \!,\varPhi \left( {{x_j},{y_j}} \right) } \right\rangle $ and $\left\langle {{\varPhi ^*}\left( {x_i^*,{y_i}} \right) \!,{\varPhi ^*}\left( {x_j^*,{y_j}} \right) } \right\rangle $. Therefore, the model’s decision function is $f\left( x,y \right) = \sum \limits _{i = 1}^l {{\alpha _i}K\left( {\left( {{x_i},{y_i}} \right) ,\left( {x,y} \right) } \right) }$.

We can learn this mapping in JKSE framework as

$$\begin{aligned} y=g\left( x \right) = \mathop {\arg \max }\limits _{y \in \mathcal{Y}} f\left( {x,y} \right) = \mathop {\arg \max }\limits _{y \in \mathcal{Y}} \sum \limits _{i = 1}^l {{\alpha _i}K\left( {\left( {{x_i},{y_i}} \right) ,\left( {x,y} \right) } \right) }. \end{aligned}$$

(24)

Here, the function $f\left( {x,y} \right) $ is equivalent to a matching function. For example in object detection, when the overlap of an object and a bounding box is higher, the value of the function is greater. Therefore, we output y that maximizes the value of $f\left( {x,y} \right) $.

Our new algorithm JKSE+ is given as follows:

Algorithm 1

(1)
Given a training set $T = \left\{ {\left( {{x_1},x_1^*,{y_1}} \right) ,...,\left( {{x_l},x_l^*,{y_l}} \right) } \right\} $ where ${x_i} \in \mathcal {X}$, ${x_i^*} \in \mathcal{X^*}$, ${y} \in \mathcal {Y}$, $i = 1,..,l$;
(2)
Choose the appropriate kernel function $K\left( {u,v} \right) $, ${K^*}\left( {u',v'} \right) $ and penalty parameters ${v> 0,\gamma > 0}$;
(3)
Construct and solve convex quadratic programming problem:
$$\begin{aligned} \begin{array}{l} \mathop {\max }\limits _{\alpha ,\delta } - \frac{1}{{2vl}}\sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {{\alpha _i}{\alpha _j}} K\left( {\left( {{x_i},{y_i}} \right) \!,\left( {{x_j},{y_j}} \right) } \right) } \\ \qquad - \sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {\frac{1}{{2\gamma }}\left( {{\alpha _i} - {\delta _i}} \right) {K^*}\left( {\left( {x_i^*,{y_i}} \right) \!,\left( {x_j^*,{y_j}} \right) } \right) \left( {{\alpha _j} - {\delta _j}} \right) } } \\ s.t.\quad \mathrm{{ }}\sum \limits _{i = 1}^l {{\alpha _i} = vl,\quad \mathrm{{ }}{\alpha _i} \ge 0}, \\ \qquad \; \mathrm{{ }}\sum \limits _{i = 1}^l {{\delta _i}} = vl, \quad \mathrm{{ 0}} \le {\delta _i} \le 1. \end{array} \end{aligned}$$
get the solution ${\left( {{\alpha ^*},{\delta ^*}} \right) = \left( {\alpha _1^*,...\alpha _l^*,\delta _1^*,...,\delta _l^*} \right) }$.
(4)
Construct decision function:
$$\begin{aligned} y = g\left( x \right) = \mathop {\arg \max }\limits _{y \in \mathcal Y} f\left( {x,y} \right) = \mathop {\arg \max }\limits _{y \in \mathcal Y} \sum \limits _{i = 1}^l {\alpha _i^*K\left( {\left( {{x_i},{y_i}} \right) \!,\left( {x,y} \right) } \right) }. \end{aligned}$$

4 Experiments

In this section, we apply our new method to the problem of object detection. In object detection, given a set of pictures, we hope to learn a mapping $g:\mathcal X \rightarrow \mathcal Y$, when inputing a picture, we can get the object’s position in the picture by mapping g. Obviously, it is a typical one of structured learning and can be solved by our new method. Some experiments are made in this section.

4.1 Dataset

We use dataset Caltech-UCSD Birds 2011 (CUB-2011) [12] to evaluate our algorithm. This dataset contains two hundred species of birds, each of which has sixty pictures. Each picture contains only one bird, the bird’s position in the picture is indicated by a bounding box. In addition, this dataset provides privilege information, including the bird’s attribute information for each image described as a 312-dimensional vector and segmentation masks.

4.2 Features and Privileged Information

Our feature descriptor adopts the bag-of-visual-words model based on SURF descriptor [1]. We use attribute informations and segmentation masks as privileged information. For the feature extraction of segmentation mask, we use the same strategy as the original image for feature extraction, that is SURF based bag-of-visual-words feature descriptor. It is clear that the feature space of privileged information provides more information relative to the feature space of the original image so that the object’s location in the image can be better detected.

We select 50 pictures as the training set and 10 pictures as the test set. The dimensionality of original visual feature descriptors is 200. In addition, attribute information is described as a 312-dimensional vector, each dimension is a binary variable. We extract the 500-dimensional feature descriptors based on the same bag-of-visual-words model from segmentation masks as in the original picture. So the privilege information has a dimension of 812-dimensional vectors.

In Fig. 1, we can see that more feature descriptors can be extracted in the segmentation masks, which is beneficial to improve the overlap of object detection.

Table 1. Dataset

Full size table

Table 2. Overlap ratio of Object Detection

Full size table

4.3 Kernal Function

We use the following version of the chi-square kernel function $\left( {{\chi ^2} - \mathrm{{kernel}}} \right) $:

$$\begin{aligned} K\left( {u,v} \right) = {K^*}\left( {u,v} \right) = {e^{ - \theta \sum \limits _{i = 1}^n {\frac{{{{\left( {{u_i} - {v_i}} \right) }^2}}}{{{u_i} + {v_i}}}} }},u \in {R^n},v \in {R^n}. \end{aligned}$$

This kernel is most commonly applied to histograms generated by bag-of-visual-words model in computer vision [13].

4.4 Experimental Results

To evaluate our JKSE+, we compare it with JKSE. During the training, we adjust the parameters v, $\gamma $, $\theta $ on a 8 $\times $ 8 $\times $ 8 space spanning values $\left[ {{{10}^{ - 4}},{{10}^{ - 3}},...,{{10}^3}} \right] $. For JKSE, we also adjust the parameter v, $\theta $ on a 8$\times $8 space spanning values $\left[ {{{10}^{ - 4}},{{10}^{ - 3}},...,{{10}^3}} \right] $.

We chose ten different birds to compare the detection results of JKSE and JKSE+ (Tables 1 and 2).

The overlap ratio of JKSE+ is higher than that of JKSE in eight datasets.

5 Conclusion

We propose a new method for structured learning with privilege information based on JKSE. Firstly, compared with some traditional methods SSVM, CRFs for structured learning, the resulting optimization problem in our new model JKSE+ is convex and can be easily solved. Secondly, compared with JKSE, the prediction performance of JKSE is improved by using the privileged information. Lastly, we apply JKSE+ to the problem of object detection. Some experimental results show that JKSE+ performs better than JKSE in most cases.

For future work, we will consider some extensions of the JKSE+ method. For example, at the training stage privileged information are provided only for a fraction of inputs or privileged information are described in many different spaces, and so on.

References

Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Blaschko, M.B., Lampert, C.H.: Learning to localize objects with structured output regression. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 2–15. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_2
Chapter Google Scholar
Burnaev, E., Smolyakov, D.: One-class SVM with privileged information and its application to malware detection. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 273–280. IEEE (2016)
Google Scholar
Feyereisl, J., Kwak, S., Son, J., Han, B.: Object localization based on structural SVM using privileged information. In: Advances in Neural Information Processing Systems, pp. 208–216 (2014)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 64. ACM (2004)
Google Scholar
Lampert, C.H., Blaschko, M.B.: Structured prediction by joint kernel support estimation. Mach. Learn. 77(2–3), 249 (2009)
Article Google Scholar
Tang, J., Tian, Y., Zhang, P., Liu, X.: Multiview privileged support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 1–15 (2017)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems, pp. 25–32 (2004)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6(Sep), 1453–1484 (2005)
MathSciNet MATH Google Scholar
Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5–6), 544–557 (2009)
Article Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–238 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Shiding Sun & Chunhua Zhang
Research Center on Fictitious Economy and Data Science, Chinese Academy of Science, Beijing, 100190, China
Yingjie Tian

Authors

Shiding Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chunhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunhua Zhang .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Yong Shi
National Supercomputing Center in Wuxi, Wuxi, China
Haohuan Fu
Chinese Academy of Sciences, Beijing, China
Yingjie Tian
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Amsterdam, Amsterdam, The Netherlands
Michael Harold Lees
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, S., Zhang, C., Tian, Y. (2018). A New Method for Structured Learning with Privileged Information. In: Shi, Y., et al. Computational Science – ICCS 2018. ICCS 2018. Lecture Notes in Computer Science(), vol 10861. Springer, Cham. https://doi.org/10.1007/978-3-319-93701-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-93701-4_35
Published: 12 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93700-7
Online ISBN: 978-3-319-93701-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics