Skip to main content
Log in

Exploiting Privileged Information from Web Data for Action and Event Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In the conventional approaches for action and event recognition, sufficient labelled training videos are generally required to learn robust classifiers with good generalization capability on new testing videos. However, collecting labelled training videos is often time consuming and expensive. In this work, we propose new learning frameworks to train robust classifiers for action and event recognition by using freely available web videos as training data. We aim to address three challenging issues: (1) the training web videos are generally associated with rich textual descriptions, which are not available in test videos; (2) the labels of training web videos are noisy and may be inaccurate; (3) the data distributions between training and test videos are often considerably different. To address the first two issues, we propose a new framework called multi-instance learning with privileged information (MIL-PI) together with three new MIL methods, in which we not only take advantage of the additional textual descriptions of training web videos as privileged information, but also explicitly cope with noise in the loose labels of training web videos. When the training and test videos come from different data distributions, we further extend our MIL-PI as a new framework called domain adaptive MIL-PI. We also propose another three new domain adaptation methods, which can additionally reduce the data distribution mismatch between training and test videos. Comprehensive experiments for action and event recognition demonstrate the effectiveness of our proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The work in Li et al. (2011) used both visual and textual features in the training process. However, it also requires the textual features in the testing process.

  2. The bias term \(\hat{b}\) and the scalar terms \(\rho \) and \(\frac{1}{\Vert \mathbf {v}\Vert }\) will not change the trend of functions.

References

  • Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.

    Article  Google Scholar 

  • Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS) (pp. 561–568).

  • Baktashmotlagh, M., Harandi, M., & Brian Lovell, M. S. (2013). Unsupervised domain adaptation by domain invariant projection. In IEEE International Conference on Computer Vision (ICCV) (pp. 769–776).

  • Bergamo, A., & Torresani, L. (2010). Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS) (pp. 181–189).

  • Bobick, A. F. (1997). Movement, activity and action: The role of knowledge in the perception of motion. Philosophical Transactions of the Royal Society B: Biological Sciences, 352(1358), 1257–1265.

    Article  Google Scholar 

  • Bootkrajang, J., & Kabán, A. (2014). Learning kernel logistic regression in the presence of class label noise. Pattern Recognition, 47(11), 3641–3655.

    Article  Google Scholar 

  • Bruzzone, L., & Marconcini, M. (2010). Domain adaptation problems: A DASVM classification technique and a circular validation strategy. T-PAMI, 32(5), 770–787.

    Article  Google Scholar 

  • Bunescu, R. C., & Mooney, R. J. (2007). Multiple instance learning for sparse positive bags. In International Conference on Machine learning (ICML) (pp. 105–112).

  • Chang, S. F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A. C., & Luo, J. (2007). Large-scale multimodal semantic concept detection for consumer video. In International Workshop on Multimedia Information Retrieval (pp. 255–264).

  • Chen, L., Duan, L., & Xu, D. (2013a) Event recognition in videos by learning from heterogeneous web sources. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2666–2673).

  • Chen, X., Shrivastava, A., & Gupta, A. (2013b) NEIL: Extracting visual knowledge from web data. In IEEE International Conference on Computer Vision (ICCV) (pp. 1409–1416).

  • Chen, Y., Bi, J., & Wang, J. Z. (2006). MILES: Multiple-instance learning via embedded instance selection. T-PAMI, 28(12), 1931–1947.

    Article  Google Scholar 

  • Chu, W. S., DelaTorre, F., & Cohn, J. (2013) Selective transfer machine for personalized facial action unit detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3515–3522).

  • Duan, L., Li, W., Tsang, I. W., & Xu, D. (2011). Improving web image search by bag-based re-ranking. T-IP, 20(11), 3280–3290.

    Article  MathSciNet  Google Scholar 

  • Duan, L., Tsang, I. W., & Xu, D. (2012a). Domain transfer multiple kernel learning. T-PAMI, 34(3), 465–479.

    Article  Google Scholar 

  • Duan, L., Xu, D., & Chang, S. F. (2012b). Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1338–1345).

  • Duan, L., Xu, D., & Tsang, I. W. (2012c). Domain adaptation from multiple sources: A domain-dependent regularization approach. T-NNLS, 23(3), 504–518.

    Google Scholar 

  • Duan, L., Xu, D., Tsang, I. W., & Luo, J. (2012d). Visual event recognition in videos by learning from web data. T-PAMI, 34(9), 1667–1680.

    Article  Google Scholar 

  • Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1778–1785).

  • Farquhar, J. D. R., Hardoon, D. R., Meng, H., Shawe-Taylor, J., & Szedmak, S. (2005). Two view learning: SVM-2K, theory and practice. In NIPS.

  • Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In ICCV.

  • Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In ICCV.

  • Ferrari, V., & Zisserman, A. (2007). Learning visual attributes. In Advances in Neural Information Processing Systems (NIPS) (pp. 433–440).

  • Fouad, S., Tino, P., Raychaudhury, S., & Schneider, P. (2013). Incorporating privileged information through metric learning. T-NNLS, 24(7), 1086–1098.

    Google Scholar 

  • Gehler, P. V., & Nowozin, S. (2008). Infinite kernel learning.Tech. rep., Max Planck Institute for Biological Cybernetics. In NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels.

  • Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2066–2073).

  • Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In IEEE International Conference on Computer Vision (ICCV) (pp. 999–1006).

  • Gretton, A., Rasch, K. M., Schlkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR, 13, 723–773.

    MathSciNet  MATH  Google Scholar 

  • Hardoon, D. R., Szedmak, S., & Shawe-taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.

    Article  MATH  Google Scholar 

  • Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In IEEE International Conference on Computer Vision (ICCV) (pp. 128–135).

  • Huang, J., Smola, A., Gretton, A., Borgwardt, K., & Scholkopf, B. (2007). Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems (NIPS) (pp. 601–608).

  • Hwang, S. J., & Grauman, K. (2012). Learning the relative importance of objects from tagged images for retrieval and cross-modal search. IJCV, 100(2), 134–153.

    Article  MathSciNet  Google Scholar 

  • Jiang, Y. G., Ye, G., Chang, S. F., Ellis, D., & Loui, A. C. (2011). Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In International Conference on Multimedia Retrieval (ICMR) (p. 29).

  • Jiang, Y. G., Bhattacharya, S., Chang, S. F., & Shah, M. (2013). High-level event recognition in unconstrained videos. International Journal of Multimedia Information Retrieval, 2(2), 73–101.

    Article  Google Scholar 

  • Kloft, M., Brefeld, U., Sonnenburg, S., & Zien, A. (2011). \({\ell }_\text{ p }\)-norm multiple kernel learning. JMLR, 12, 953–997.

  • Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV) (pp. 2556–2563).

  • Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1785–1792).

  • Le, Q. V., Zou, W. Y., Yeung, S. Y., & Ng, A.Y. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3361–3368).

  • Leung, T., Song, Y., & Zhang, J. (2011). Handling label noise in video classification via multiple instance learning. In IEEE International Conference on Computer Vision (ICCV) (pp. 2056–2063).

  • Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale Internet images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 851–858).

  • Li, W., Duan, L., Xu, D., & Tsang, I. W. (2011). Text-based image retrieval using progressive multi-instance learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2368–2375).

  • Li, W., Duan, L., Tsang, I.W., & Xu, D. (2012a). Batch mode adaptive multiple instance learning for computer vision tasks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2368–2375).

  • Li, W., Duan, L., Tsang, I.W., & Xu, D. (2012b). Co-labeling: A new multi-view learning approach for ambiguous problems. In IEEE International Conference on Data Mining (ICDM) (pp. 419–428).

  • Li, W., Duan, L., Xu, D., & Tsang, I. W. (2014a). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. T-PAMI, 36(6), 1134–1148.

    Article  Google Scholar 

  • Li, W., Niu, L., & Xu, D. (2014b). Exploiting privileged information from web data for image categorization. In European Conference on Computer Vision (ECCV) (pp. 437–452).

  • Li, Y.-F., Tsang, I. W., Kwok, J. T., & Zhou, Z.-H. (2009). Tighter and convex maximum margin clustering. In International Conference on Artificial Intelligence and Statistics (pp. 344–351).

  • Liang, L., Cai, F., & Cherkassky, V. (2009). Predictive learning with structured (grouped) data. Neural Networks, 22, 766–773.

    Article  MATH  Google Scholar 

  • Lin, Z., Jiang, Z., & Davis, L. S. (2009). Recognizing actions by shape-motion prototype trees. In IEEE International Conference on Computer Vision (ICCV) (pp. 444–451).

  • Loui, A., Luo, J., Chang, S. F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., & Yanagawa, A. (2007). Kodak’s consumer video benchmark data set: concept definition and annotation. In International Workshop on Multimedia Information Retrieval (pp. 245–254).

  • Morariu, V.I., & Davis, L.S. (2011). Multi-agent event recognition in structured scenarios. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3289–3296).

  • Natarajan, N., Dhillon, I. S., Ravikumar, P. K., & Tewari, A. (2013). Learning with noisy labels. In Advances in Neural Information Processing Systems, pp 1196–1204.

  • Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. T-NN, 22(2), 199–210.

    Google Scholar 

  • Schroff, F., Criminisi, A., & Zisserman, A. (2011). Harvesting image databases from the web. T-PAMI, 33(4), 754–766.

    Article  Google Scholar 

  • Sharmanska, V., Quadrianto, N., Lampert, C. H. (2013). Learning to rank using privileged information. In IEEE International Conference on Computer Vision (ICCV) (pp. 825–832).

  • Shi, Y., Huang, Y., Minnen, D., Bobick, A., & Essa, I. (2004). Propagation networks for recognition of partially ordered sequential action. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. II-862–II-869).

  • Torralba, A., & Efros, A.A. (2011). Unbiased look at dataset bias. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1521–1528).

  • Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. T-PAMI, 30(11), 1958–1970.

    Article  Google Scholar 

  • Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In European Conference on Computer Vision (ECCV) (pp. 776–789).

  • Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using markov logic networks. In European Conference on Computer Vision (ECCV) (pp. 610–623).

  • Vapnik, V., & Vashist, A. (2009). A new learning paradigm: Learning using privileged infromatin. Neural Networks, 22, 544–557.

    Article  MATH  Google Scholar 

  • Vijayanarasimhan, S., & Grauman, K. (2008). Keywords to visual categories: Multiple-instance learning for weakly supervised object categorization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).

  • Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV) (pp. 3551–3558).

  • Wang, H., Klaser, A., Schmid, C., & Liu, C. L. (2011a). Action recognition by dense trajectories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3169–3176).

  • Wang, L., Wang, Y., & Gao, W. (2011b). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.

  • Xu, D., & Chang, S. F. (2008). Video event recognition using kernel methods with multilevel temporal alignment. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11), 1985–1997.

  • Yu, T. H., Kim, T.K., & Cipolla, R. (2010). Real-time action recognition by spatiotemporal semantic and structural forests. In The British Machine Vision Conference (BMVC) (p. 52.1–52.12).

  • Zeng, Z., & Ji, Q. (2010). Knowledge based activity recognition with dynamic bayesian network. In European Conference on Computer Vision (ECCV) (pp. 532–546).

  • Zhou, Z., & Zhang, M. (2006). Multi-instance multi-label learning with application to scene classification. In Advances in neural information processing systems (NIPS) (pp. 1609–1616).

  • Zhu, G., Yang, M., Yu, K., Xu, W., & Gong, Y. (2009). Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor. In Proceedings of the 17th ACM international conference on Multimedia (pp. 165–174). ACM.

Download references

Acknowledgments

This research was supported by funding from the Faculty of Engineering & Information Technologies, The University of Sydney, under the Faculty Research Cluster Program. This work was also supported by the Singapore MoE Tier 2 Grant (ARC42/13).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Li.

Additional information

Communicated by M. Hebert.

Appendices

Appendix 1: Detailed Derivations for (16)

We provide the complete derivations for (16). For ease of presentation, we define

$$\begin{aligned} F(\hat{\varvec{\alpha }}, \varvec{\beta }) = \frac{1}{2\gamma }(\hat{\varvec{\alpha }}+ \varvec{\beta }- C_1\mathbf {1})'\tilde{\mathbf {K}}(\hat{\varvec{\alpha }}+ \varvec{\beta }- C_1\mathbf {1}). \end{aligned}$$

Then, the problem in (15) can be rewritten as,

$$\begin{aligned}&\min _{\mathbf {d}} \max _{(\varvec{\alpha },\varvec{\beta }) \in \mathcal {S}} \quad \mathbf {1}'\varvec{\alpha }-\frac{1}{2} \varvec{\alpha }'\left( \sum _{t=1}^Td_t\mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\right) \varvec{\alpha }- F(\hat{\varvec{\alpha }}, \varvec{\beta })\nonumber \\&\quad \text{ s.t. } \quad \sum _{t=1}^Td_t = 1, \quad d_t \ge 0, \quad \forall t = 1,\ldots , T \end{aligned}$$
(44)

Let us introduce a dual variable \(\tau \) for the constraint \(\sum _{t=1}^Td_t = 1\) and another dual variable \(\nu _t\) for each constraint \(d_t \ge 0\) in problem (44), we arrive at its Lagrangian as follows,

$$\begin{aligned} \mathcal {L}= & {} \mathbf {1}'\varvec{\alpha }-\frac{1}{2} \varvec{\alpha }'\left( \sum _{t=1}^Td_t\mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\right) \varvec{\alpha }-F(\hat{\varvec{\alpha }}, \varvec{\beta })\nonumber \\&+\, \tau \left( \sum _{t=1}^Td_t - 1\right) - \sum _{t=1}^T \nu _td_t. \end{aligned}$$
(45)

The derivative of the Lagrangian w.r.t. \(d_t\) can be written as,

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial d_t} = -\frac{1}{2} \varvec{\alpha }'\left( \mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\right) \varvec{\alpha }+ \tau - \nu _t, \quad \forall t = 1, \ldots , T. \end{aligned}$$

Let us set \(\frac{\partial \mathcal {L}}{\partial d_t} = 0\) and consider \(\nu _t \ge 0\), we have

$$\begin{aligned} \frac{1}{2} \varvec{\alpha }'\left( \mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\right) \varvec{\alpha }\le \tau , \quad \forall t = 1, \ldots , T. \end{aligned}$$

By substituting \(\frac{\partial \mathcal {L}}{\partial d_t} = 0\) into the Lagrangian, we obtain the duality of (44) as follows,

$$\begin{aligned}&\max _{\tau }\max _{(\varvec{\alpha },\varvec{\beta }) \in \mathcal {S}} \quad 1'\varvec{\alpha }- F(\hat{\varvec{\alpha }}, \varvec{\beta }) - \tau \nonumber \\&\quad \text{ s.t. } \quad \frac{1}{2} \varvec{\alpha }'\left( \mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\right) \varvec{\alpha }\le \tau , \quad \forall t = 1, \ldots , T, \end{aligned}$$
(46)

which is the same as (16). We complete the derivations here.

Appendix 2: Solution to the MKL Problem at Step 4 of Algorithm 3

At Step 4 of Algorithm 3, we solve an MKL problem in (15) by setting \(\mathcal {Y}= \mathcal {C}\). As \(\mathcal {C}\) contains only a small number of label vectors, so the number of base kernels is not large. Now we give the algorithm for solving the MKL problem in (15) with a few kernels.

Let us denote \(\tilde{T} = |\mathcal {C}|\) as the number of label vectors in \(\mathcal {C}\). We also define \(\mathbf {d}= [d_1, \ldots , d_{\tilde{T}}]'\) as the vector of kernel coefficients, and \(\mathcal {D}= \{\mathbf {d}'\mathbf {1} = 1, \mathbf {d}\ge 0\}\). Note we use the same symbols \(\mathbf {d}\) and \(\mathcal {D}\) as in (15) for simplicity, but the dimensionality of \(\mathbf {d}\) (i.e., \(\tilde{T} = |\mathcal {C}|\)) is much smaller than that in (15) (i.e., \(T = |\mathcal {Y}|\)). Now, we write the primal form of (15) as follows,

$$\begin{aligned}&\min _{\mathop {\tilde{\mathbf {w}}, \tilde{b}, \mathbf {w}_t, \varvec{\eta }}\limits ^{\mathbf {d}\in \mathcal {D}}}\quad \frac{1}{2}\sum _{t=1}^{\tilde{T}}\frac{\Vert \mathbf {w}_t\Vert ^2}{d_t}+\frac{\gamma }{2}\Vert \tilde{\mathbf {w}}\Vert ^2\nonumber \\&\qquad + \,C_1\sum _{i=1}^{n^+}\xi (\tilde{\phi }(\tilde{\mathbf {x}}_i)) + \sum _{i=n^+ +1}^{n}\eta _i, \end{aligned}$$
(47)
$$\begin{aligned}&\quad \text{ s.t. } \quad \sum _{t=1}^{\tilde{T}}\mathbf {w}_t'\psi _t(\mathbf {x}_i)\ge 1 - \xi (\tilde{\phi }(\tilde{\mathbf {x}}_i)),\nonumber \\&\qquad \qquad \xi (\tilde{\phi }(\tilde{\mathbf {x}}_i)) \ge 0, \qquad i=1,\ldots ,n^+, \nonumber \\&\qquad \qquad \sum _{t=1}^{\tilde{T}}\mathbf {w}_t'\psi _t(\mathbf {x}_i) \ge 1 - \eta _i,\nonumber \\&\qquad \qquad \eta _i \ge 0, \qquad i=n^+ +1, \ldots , n, \end{aligned}$$
(48)

where \(\psi _t(\mathbf {x}_i)\) is the nonlinear feature of \(\mathbf {x}_i\) induced by the kernel \(\mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\), and \(\mathbf {w}_t = d_t\sum _{i=1}^{n}\alpha _iy^t_i\phi (\mathbf {x}_i)\) with \(y^t_i\) being the i-the entry in \(\mathbf {y}_t\).

The above problem is a convex problem w.r.t. \(\tilde{\mathbf {w}}, \tilde{b}\), \(\mathbf {w}_t, \varvec{\eta }\), and \(\mathbf {d}\), so we can achieve the global optimum by alternatively optimizing two set of variables \(\{\tilde{\mathbf {w}}, \tilde{b}, \mathbf {w}_t, \varvec{\eta }\}\), and \(\mathbf {d}\).

Fix \(\mathbf {d}\): When \(\mathbf {d}\) is fixed, we solve for \(\{\tilde{\mathbf {w}}, \tilde{b}, \mathbf {w}_t, \varvec{\eta }\}\) by optimizing the dual problem in (15), i.e.,

$$\begin{aligned}&\max _{(\varvec{\alpha },\varvec{\beta }) \in \mathcal {S}} \qquad \mathbf {1}'\varvec{\alpha }-\frac{1}{2} \varvec{\alpha }'\left( \sum _{t=1}^{\tilde{T}}d_t\mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t'\right) \varvec{\alpha }\nonumber \\&\quad -\,\frac{1}{2\gamma }(\hat{\varvec{\alpha }}+ \varvec{\beta }- C_1\mathbf {1})'\tilde{\mathbf {K}}(\hat{\varvec{\alpha }}+ \varvec{\beta }- C_1\mathbf {1}), \end{aligned}$$
(49)

which is a quadratic programming problem w.r.t. \((\varvec{\alpha }, \varvec{\beta })\), and can be solved by using any QP solver.

Fix \(\{\tilde{\mathbf {w}}, \tilde{b}, \mathbf {w}_t, \varvec{\eta }\}\): The optimization problem w.r.t. \(\mathbf {d}\) can be written as,

$$\begin{aligned}&\min _{\mathbf {d}} \quad \frac{1}{2}\sum _{t=1}^{\tilde{T}}\frac{\Vert \mathbf {w}_t\Vert ^2}{d_t}\nonumber \\&\quad \text{ s.t. } \quad \mathbf {d}'\mathbf {1} = 1, \quad \mathbf {d}\ge 0, \end{aligned}$$
(50)

which is the same as solving the kernel coefficients in \(\ell _p\)-norm MKL (Kloft et al. 2011) when \(p = 1\), and has a closed-form solution as below,

$$\begin{aligned} d_t = \frac{\Vert \mathbf {w}_t\Vert }{\sum _{t=1}^{\tilde{T}}\Vert \mathbf {w}_t\Vert }, \end{aligned}$$
(51)

where \(\Vert \mathbf {w}_t\Vert \) can be calculated from \(\Vert \mathbf {w}_t\Vert ^2 = d_t^2\varvec{\alpha }'(\mathbf {Q}\circ \mathbf {y}_t\mathbf {y}_t')\varvec{\alpha }\). We repeat above two steps until the objective value of (49) converges.

Appendix 3: Proof of Proposition 1

Proof

By introducing the dual variables \(\hat{\varvec{\alpha }} = [\alpha _1, \ldots , \alpha _{L^+}]' \in {\mathbb R}^{L^+}\) for the constraints in (23), \(\bar{\varvec{\alpha }} = [\alpha _{L^++1}, \ldots , \alpha _m]' \in {\mathbb R}^{m-L^+}\) for the constraints (24), \(\hat{\varvec{\beta }} = [\beta _{1}, \ldots , \beta _{L^+}]' \in {\mathbb R}^{L^+}\) for the constraints in (25), \(\bar{\varvec{\beta }} = [\beta _{L^++1}, \ldots , \beta _{m}]' \in {\mathbb R}^{m-L^+}\) for the constraints in (26), and \({\varvec{\nu }}= [\nu _1, \ldots , \nu _m]'\) for the constraints in (27), we arrive at its Lagrangian as follows:

$$\begin{aligned} \mathcal {L}= & {} \frac{1}{2}\left( \Vert \mathbf {w}\Vert ^2+\gamma \Vert \tilde{\mathbf {w}}\Vert ^2\right) + C_1\sum _{i=1}^{L^+}{(\tilde{\mathbf {w}}'\tilde{\mathbf {z}}^s_i+\tilde{b})} \nonumber \\&+ \sum _{i=L^+ +1}^{m}{\eta _i} + \frac{\lambda }{2}\Vert \hat{\mathbf {w}} - \rho \mathbf {v}\Vert ^2 + C_2\sum _{i=1}^{m}(\hat{\mathbf {w}}'\mathbf {z}^s_i + \hat{b})\nonumber \\&- \sum _{i=1}^{L^+} \hat{\alpha _i}(\mathbf {w}'\mathbf {z}^s_i + b - p_i + \tilde{\mathbf {w}}'\tilde{\mathbf {z}}^s_i+\tilde{b} + \hat{\mathbf {w}}'\mathbf {z}^s_i + \hat{b}) \nonumber \\&- \sum _{i=L^+ +1}^{m} \bar{\alpha _i}(-\mathbf {w}'\mathbf {z}^s_i - b -1 + \eta _i + \hat{\mathbf {w}}'\mathbf {z}^s_i + \hat{b}) \nonumber \\&- \sum _{i=1}^{L^+} \hat{\beta _i}(\tilde{\mathbf {w}}'\tilde{\mathbf {z}}^s_i+\tilde{b}) - \sum _{i=L^+ +1}^{m} \bar{\beta _i}\eta _i - \sum _{i=1}^{m} \nu _i (\hat{\mathbf {w}}'\mathbf {z}^s_i + \hat{b}),\nonumber \\ \end{aligned}$$
(52)

Let us define \(\varvec{\alpha }= [\hat{\varvec{\alpha }}', \bar{\varvec{\alpha }}']'\), \(\varvec{\beta }= [\hat{\varvec{\beta }}', \bar{\varvec{\beta }}']'\), \(\mathbf {Z}= [\mathbf {z}^s_1, \ldots , \mathbf {z}^s_m]\), \(\tilde{\mathbf {Z}}= [\tilde{\mathbf {z}}^s_1, \ldots , \tilde{\mathbf {z}}^s_{L^+}]\), and \(\mathbf {y}= [\mathbf {1}_{L^+}', -\mathbf {1}_{m-L^+}']'\), then the derivatives of the Lagrangian w.r.t. \(\mathbf {w},b,\tilde{\mathbf {w}},\tilde{b},\hat{\mathbf {w}},\hat{b}, \varvec{\eta }\) can be obtained as follows:

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \mathbf {w}}= & {} \mathbf {w}-\mathbf {Z}(\varvec{\alpha }\circ \mathbf {y}),\nonumber \\ \frac{\partial \mathcal {L}}{\partial b}= & {} -\varvec{\alpha }'\mathbf {y},\nonumber \\ \frac{\partial \mathcal {L}}{\partial \tilde{\mathbf {w}}}= & {} \gamma \tilde{\mathbf {w}}-\tilde{\mathbf {Z}}(\hat{\varvec{\alpha }}+\hat{\varvec{\beta }}-C_1\mathbf {1}_{L^+}),\nonumber \\ \frac{\partial \mathcal {L}}{\partial \tilde{b}}= & {} -\mathbf {1}_{L^+}'(\hat{\varvec{\alpha }}+\hat{\varvec{\beta }}-C_1 \mathbf {1}_{L^+}),\nonumber \\ \frac{\partial \mathcal {L}}{\partial \hat{\mathbf {w}}}= & {} \lambda \hat{\mathbf {w}}-\lambda \rho \mathbf {v}-\mathbf {Z}(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}_m),\nonumber \\ \frac{\partial \mathcal {L}}{\partial \hat{b}}= & {} -\mathbf {1}_m'(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}),\nonumber \\ \frac{\partial \mathcal {L}}{\partial \varvec{\eta }}= & {} \mathbf {1}_{m-L^+}-\bar{\varvec{\alpha }} - \bar{\varvec{\beta }}\nonumber . \end{aligned}$$

By setting those derivatives to zeros, we have the following equations:

$$\begin{aligned} \mathbf {w}= & {} \mathbf {Z}(\varvec{\alpha }\circ \mathbf {y}),\end{aligned}$$
(53)
$$\begin{aligned} \tilde{\mathbf {w}}= & {} \frac{1}{\gamma }\tilde{\mathbf {Z}}(\hat{\varvec{\alpha }}+\hat{\varvec{\beta }}-C_1\mathbf {1}_{L^+}),\end{aligned}$$
(54)
$$\begin{aligned} \hat{\mathbf {w}}= & {} \rho \mathbf {v}+ \frac{1}{\lambda }\mathbf {Z}(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}_m), \end{aligned}$$
(55)

as well as the following constraints, \(\varvec{\alpha }'\mathbf {y}= 0\), \( \mathbf {1}_{L^+}'(\hat{\varvec{\alpha }}+\hat{\varvec{\beta }}-C_1 \mathbf {1}_{L^+}) = 0\), \(\mathbf {1}_m'(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}_m) = 0\), \(\bar{\varvec{\alpha }} \le \mathbf {1}_{m-L^+}\). Substituting the equations (53), (54) and (55) into (52) and considering \(\varvec{\alpha }, \varvec{\beta }, {\varvec{\nu }}\ge \mathbf {0}\), we obtain the following dual form,

$$\begin{aligned}&\min _{\varvec{\alpha }, \varvec{\beta }, {\varvec{\nu }}} \quad -\mathbf {p}'\varvec{\alpha }+ \frac{1}{2} \varvec{\alpha }'(\mathbf {K}\circ \mathbf {y}\mathbf {y}')\varvec{\alpha }\nonumber \\&\quad + \frac{1}{2\gamma }(\hat{\varvec{\alpha }}+ \hat{\varvec{\beta }} - C_1\mathbf {1})'\tilde{\mathbf {K}}(\hat{\varvec{\alpha }}+ \hat{\varvec{\beta }} - C_1\mathbf {1})\nonumber \\&\quad +\frac{1}{2\lambda }(\varvec{\alpha }+ {\varvec{\nu }}-C_2\mathbf {1}_m)'\mathbf {K}(\varvec{\alpha }+{\varvec{\nu }}- C_2\mathbf {1}_m) \nonumber \\&\quad + \rho \mathbf {v}'\mathbf {Z}(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}_m)\end{aligned}$$
(56)
$$\begin{aligned}&\quad \text{ s.t. }\quad \varvec{\alpha }'\mathbf {y}= 0, \quad \mathbf {1}_{L^+}'(\hat{\varvec{\alpha }}+\hat{\varvec{\beta }}-C_1 \mathbf {1}_{L^+}) = 0, \nonumber \\&\qquad \qquad \bar{\varvec{\alpha }} \le \mathbf {1}_{m-L^+}, \nonumber \\&\qquad \qquad \mathbf {1}_m'(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}_m) = 0, \quad \varvec{\alpha }, \varvec{\beta }, {\varvec{\nu }}\ge \mathbf {0}, \end{aligned}$$
(57)

Let us define \(\varvec{\theta }= \frac{1}{C_2}(\varvec{\alpha }+ {\varvec{\nu }})\), then the constraint \(\mathbf {1}_m'(\varvec{\alpha }+{\varvec{\nu }}-C_2\mathbf {1}_m) = 0\) becomes \(\mathbf {1}_m'\varvec{\theta }= m\), and the constraint \({\varvec{\nu }}\ge \mathbf {0}\) becomes \(\varvec{\alpha }\le C_2\varvec{\theta }\). Let us define the feasible set for \((\varvec{\alpha }, \varvec{\beta }, {\varvec{\nu }})\) as \(\mathcal {A}= \{ \varvec{\alpha }'\mathbf {y}= 0, \mathbf {1}_{L^+}'(\hat{\varvec{\alpha }}+\hat{\varvec{\beta }}-C_1 \mathbf {1}_{L^+}) = 0, \bar{\varvec{\alpha }} \le \mathbf {1}_{m-L^+}, \mathbf {1}_m'\varvec{\theta }= m, \varvec{\alpha }\le C_2\varvec{\theta }, \varvec{\alpha }, \varvec{\beta }, \varvec{\theta }\ge \mathbf {0}\}\). Substituting \(\varvec{\theta }= \frac{1}{C_2}(\varvec{\alpha }+ {\varvec{\nu }})\) into (56), we arrive at,

$$\begin{aligned}&\min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}} \quad -\mathbf {p}'\varvec{\alpha }+ \frac{1}{2} \varvec{\alpha }'(\mathbf {K}\circ \mathbf {y}\mathbf {y}')\varvec{\alpha }\nonumber \\&\quad + \frac{1}{2\gamma }(\hat{\varvec{\alpha }}+ \hat{\varvec{\beta }} - C_1\mathbf {1})'\tilde{\mathbf {K}}(\hat{\varvec{\alpha }}+ \hat{\varvec{\beta }} - C_1\mathbf {1})\nonumber \\&\quad +\frac{(C_2)^2}{2\lambda }(\varvec{\theta }- \mathbf {1}_m)'\mathbf {K}(\varvec{\theta }-\mathbf {1}_m) + \rho C_2\mathbf {v}'\mathbf {Z}(\varvec{\theta }- \mathbf {1}_m) \end{aligned}$$
(58)

Recall in the main text we have defined \(H(\varvec{\alpha },\varvec{\beta })=-\mathbf {p}'\varvec{\alpha }+ \frac{1}{2} \varvec{\alpha }'(\mathbf {K}\circ \mathbf {y}\mathbf {y}')\varvec{\alpha }+ \frac{1}{2\gamma }(\hat{\varvec{\alpha }}+ \hat{\varvec{\beta }} - C_1\mathbf {1})'\tilde{\mathbf {K}}(\hat{\varvec{\alpha }}+ \hat{\varvec{\beta }} - C_1\mathbf {1})\), then we simplify the objective function in (58) as follows,

$$\begin{aligned}&\min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}} \quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{(C_2)^2}{2\lambda }(\varvec{\theta }- \mathbf {1}_m)'\mathbf {K}(\varvec{\theta }-\mathbf {1}_m) \nonumber \\&\quad + \rho C_2\mathbf {v}'\mathbf {Z}(\varvec{\theta }- \mathbf {1}_m) \end{aligned}$$
(59)

Now, we derive the objective function as follows,

$$\begin{aligned}&\min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}}\quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{(C_2)^2}{2\lambda }(\varvec{\theta }- \mathbf {1}_m)'\mathbf {K}(\varvec{\theta }-\mathbf {1}_m)\nonumber \\&\quad + \rho C_2\mathbf {v}'\mathbf {Z}(\varvec{\theta }- \mathbf {1}_m)\end{aligned}$$
(60)
$$\begin{aligned}&\Leftrightarrow \min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}}\quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{(C_2)^2}{2\lambda }(\varvec{\theta }'\mathbf {K}\varvec{\theta }- 2\mathbf {1}_m'\mathbf {K}\varvec{\theta }) \nonumber \\&\quad + \rho C_2\mathbf {v}'\mathbf {Z}\varvec{\theta }\end{aligned}$$
(61)
$$\begin{aligned}&\Leftrightarrow \min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}} \quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{(C_2)^2}{2\lambda }\varvec{\theta }'\mathbf {K}\varvec{\theta }- \frac{(C_2)^2}{\lambda }\mathbf {1}_m'\mathbf {K}\varvec{\theta }\nonumber \\&\quad + \frac{\rho C_2}{m}\mathbf {1}_m'\mathbf {K}\varvec{\theta }- \frac{\rho C_2}{n_t}\mathbf {1}_{n_t}'\mathbf {K}_{ts}\varvec{\theta } \end{aligned}$$
(62)

where in (61) we omit the constant terms, and in (62) we use the equation that \(\mathbf {v}'\mathbf {Z}= \frac{1}{m}\mathbf {1}_m'\mathbf {K}\ - \frac{1}{n_t}\mathbf {1}_{n_t}'\mathbf {K}_{ts}\) with \(\mathbf {K}_{ts} \in {\mathbb R}^{n_t\times m}\) being the kernel matrix between the target domain samples and the source domain samples. Let us define \(\lambda = \frac{(C_2m)^2}{\mu }\) and \(\rho = \frac{C_2m}{\lambda } = \frac{\mu }{C_2m}\) and omit the constant term, then the problem in (62) becomes

$$\begin{aligned}&\min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}} \quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{\mu }{2m^2}\varvec{\theta }'\mathbf {K}\varvec{\theta }- \frac{\mu }{mn_t}\mathbf {1}_{n_t}'\mathbf {K}_{ts}\varvec{\theta }\nonumber \\&\Leftrightarrow \min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}} \quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{\mu }{2m^2}\varvec{\theta }'\mathbf {K}\varvec{\theta }- \frac{\mu }{mn_t}\mathbf {1}_{n_t}'\mathbf {K}_{ts}\varvec{\theta }\nonumber \\&\quad + \frac{\mu }{2n_t^2}\mathbf {1}_{n_t}'\mathbf {K}_t\mathbf {1}_{n_t}\end{aligned}$$
(63)
$$\begin{aligned}&\Leftrightarrow \min _{(\varvec{\alpha }, \varvec{\beta }, \varvec{\theta })\in \mathcal {A}} \quad H(\varvec{\alpha }, \varvec{\beta }) +\frac{\mu }{2}\Vert \frac{1}{m}\sum _{i=1}^m\theta _i\mathbf {z}^s_i - \frac{1}{n_t}\sum _{i=1}^{n_t}\mathbf {z}^t_i\Vert ^2,\nonumber \\ \end{aligned}$$
(64)

where in (63) we add a constant \(\frac{\mu }{2n_t^2}\mathbf {1}_{n_t}'\mathbf {K}_t\mathbf {1}_{n_t}\) to the objective function with \(\mathbf {K}_t \in {\mathbb R}^{n_t\times n_t}\) being the kernel matrix on the target domain samples. Note the problem in (64) is exactly the problem in (18). We complete the proof here. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, L., Li, W. & Xu, D. Exploiting Privileged Information from Web Data for Action and Event Recognition. Int J Comput Vis 118, 130–150 (2016). https://doi.org/10.1007/s11263-015-0862-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-015-0862-5

Keywords

Navigation