Effective Multiclass Transfer for Hypothesis Transfer Learning

Ao, Shuang; Li, Xiang; Ling, Charles X.

doi:10.1007/978-3-319-57529-2_6

Shuang Ao¹⁹,
Xiang Li¹⁹ &
Charles X. Ling¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3057 Accesses
1 Citations

Abstract

In this paper, we investigate the visual domain adaptation problem under the setting of Hypothesis Transfer Learning (HTL) where we can only access the source model instead of the data. However, previous studies of HTL are limited to either leveraging the knowledge from certain type of source classifier or low transfer efficiency on a small training set. In this paper, we aim at two important issues: effectiveness of the transfer on small target training set and compatibility of the transfer model for real-world HTL problems. To solve these two issues, we proposed our method, Effective Multiclass Transfer Learning (EMTLe). We demonstrate that EMTLe, which uses the prediction of the source models as the transferable knowledge can exploit the knowledge of different types of source classifiers. We use the transfer parameter to weigh the importance the prediction of each source model as the auxiliary bias. Then we use the bi-level optimization to estimate the transfer parameter and demonstrate that we can effectively obtain the optimal transfer parameter with our novel objective function. Empirical results show that EMTLe can effectively exploit the knowledge and outperform other HTL baselines when the size of the target training set is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
13 classes include: backpack, bike, helmet, bottle, calculator, headphone, keyboard, laptop, monitor, mouse, mug, phone and projector.

References

Aytar, Y., Zisserman, A.: Tabula rasa: model transfer for object category detection. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2252–2259. IEEE (2011)
Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)
Article MathSciNet Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Adv. Neural Inf. Process. Syst. 19, 137 (2007)
Google Scholar
Cawley, G.C.: Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1661–1668. IEEE (2006)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
MATH Google Scholar
Davis, J., Domingos, P.: Deep transfer via second-order Markov logic. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 217–224. ACM (2009)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Article Google Scholar
Jie, L., Tommasi, T., Caputo, B.: Multiclass transfer learning from unconstrained priors. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1863–1870. IEEE (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Kuzborskij, I., Orabona, F.: Stability and hypothesis transfer learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 942–950 (2013)
Google Scholar
Kuzborskij, I., Orabona, F., Caputo, B.: From n to n+1: multiclass transfer incremental learning. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3358–3365. IEEE (2013)
Google Scholar
Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Pedregosa, F.: Hyperparameter optimization with approximate gradient. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 737–746 (2016)
Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (2015)
MATH Google Scholar
Tommasi, T., Orabona, F., Caputo, B.: Learning categories from few examples with multi model knowledge transfer. Pattern Anal. Mach. Intell. 36(5), 928–941 (2014)
Article Google Scholar
Wang, X., Huang, T.K., Schneider, J.: Active transfer learning under model shift. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1305–1313 (2014)
Google Scholar
Yang, J., Yan, R., Hauptmann, A.G.: Adapting SVM classifiers to data with shifted distributions. In: 2007 Seventh IEEE International Conference on Data Mining Workshops, ICDM Workshop 2007, pp. 69–76. IEEE (2007)
Google Scholar
Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive SVMs. In: Proceedings of the 15th International Conference on Multimedia, pp. 188–197. ACM (2007)
Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their valuable comments to improve this paper. This work is supported by Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Computer Science, Western University, London, Canada
Shuang Ao, Xiang Li & Charles X. Ling

Authors

Shuang Ao
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Charles X. Ling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles X. Ling .

Editor information

Editors and Affiliations

Kangwon National University, Chuncheon, Korea (Republic of)
Jinho Kim
Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
University of New South Wales, Sydney, New South Wales, Australia
Xuemin Lin
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Appendix

Theorem 1

Let $L(\beta )$ be a $\lambda $-strongly convex function and $\beta ^*$ be its optimal solution. Let $\beta _1,...,\beta _{T+1}$ be a sequence such that $\beta _1 \in B$ and for $t>1$, we have $\beta _{t+1} = \beta _t - \eta _t \varDelta _t$, where $\varDelta _t$ is the sub-gradient of $L(\beta _t)$ and $\eta _t = 1/(\lambda t)$. Assume we have $||\varDelta _t|| \le G$ for all t. Then we have:

$$\begin{aligned} L(\beta _{T+1}) \le L(\beta ^*)+\frac{G^2(1+\ln (T))}{2\lambda T} \end{aligned}$$

(9)

Proof

As $L(\beta )$ is strongly convex and $\varDelta _t$ is in its sub-gradient set at $\beta _t$, according to the definition of $\lambda $-strong convexity [15], the following inequality holds:

$$\begin{aligned} \left\langle {\beta _t - \beta ^*,\varDelta _t} \right\rangle \ge L(\beta _t)-L(\beta ^*)+\frac{\lambda }{2}||\beta _t - \beta ^*||^2 \end{aligned}$$

(10)

For the term $\left\langle {\beta _t - \beta ^*,\varDelta _y} \right\rangle $, it can be written as:

$$\begin{aligned} \begin{aligned} \left\langle {\beta _t - \beta ^*,\varDelta _t} \right\rangle&= \left\langle {\beta _t - \frac{1}{2}\eta _t\varDelta _t + \frac{1}{2}\eta _t\varDelta _t- \beta ^*,\varDelta _t} \right\rangle =\frac{1}{2}\left\langle {{\beta _{t + 1}} + {\beta _t} - 2{\beta ^*},{\varDelta _t}} \right\rangle + \frac{1}{2}{\eta _t}\varDelta _t^2 \end{aligned} \end{aligned}$$

(11)

Then we have:

$$\begin{aligned} \begin{aligned} ||\beta _t-\beta ^*||^2-||\beta _{t+1}-\beta ^*||^2 =\left\langle {{\beta _{t + 1}} + {\beta _t} - 2{\beta ^*},{\eta _t\varDelta _t}} \right\rangle \end{aligned} \end{aligned}$$

(12)

Using the assumption $||\varDelta _t|| \le G$, we can rearrange (10) and plug (11) and (12) into it, we have:

$$\begin{aligned} \begin{aligned}&{Diff}_t = L(\beta _t)-L(\beta ^*) \le \frac{\lambda (t-1)}{2}{||{\beta _t} - {\beta ^*}||^2}- \frac{\lambda t}{2}{||{\beta _{t+1}} - {\beta ^*}||^2}+\frac{1}{2}{\eta _t} G^2 \end{aligned} \end{aligned}$$

(13)

Due to the convexity, for each pair of $L(\beta _t)$ and $L(\beta _{t+1})$ for $t=1,...,T$, we have the following sequence $L(\beta ^*) \le L(\beta _T) \le L(\beta _{T-1}) \le ...\le L(\beta _1)$. For the sequence $Diff_t$ for $t=1,...,T$, we have:

$$\begin{aligned} \sum _{t=1}^{T} Diff_t = \sum _{t=1}^{T}L(\beta _t)-TL(\beta ^*) \ge T\left[ L(\beta _T)-L(\beta ^*)\right] \end{aligned}$$

(14)

Next, we show that

$$\begin{aligned} \begin{aligned}&\sum _{t=1}^{T} Diff_t = \sum _{t=1}^{T}\left\{ \frac{\lambda (t-1)}{2}{||{\beta _t} - {\beta ^*}||^2}- \frac{\lambda t}{2}{||{\beta _{t+1}} - {\beta ^*}||^2}+\frac{1}{2}{\eta _t} G^2\right\} \\&=-\frac{\lambda T}{2}{||{\beta _{T+1}-\beta ^*}||^2} + \frac{G^2}{2 \lambda }\sum _{t=1}^{T} \frac{1}{t}\le \frac{G^2}{2 \lambda }\sum _{t=1}^{T} \frac{1}{t} \le \frac{G^2}{2 \lambda }(1+\ln (T)) \end{aligned} \end{aligned}$$

(15)

Combining (14) and rearranging the result, we have:

$$\begin{aligned} L(\beta _{T+1}) \le L(\beta ^*)+\frac{G^2(1+\ln (T))}{2\lambda T} \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ao, S., Li, X., Ling, C.X. (2017). Effective Multiclass Transfer for Hypothesis Transfer Learning. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-57529-2_6
Published: 23 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effective Multiclass Transfer for Hypothesis Transfer Learning

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Theorem 1

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation