Skip to main content

Effective Multiclass Transfer for Hypothesis Transfer Learning

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Included in the following conference series:

Abstract

In this paper, we investigate the visual domain adaptation problem under the setting of Hypothesis Transfer Learning (HTL) where we can only access the source model instead of the data. However, previous studies of HTL are limited to either leveraging the knowledge from certain type of source classifier or low transfer efficiency on a small training set. In this paper, we aim at two important issues: effectiveness of the transfer on small target training set and compatibility of the transfer model for real-world HTL problems. To solve these two issues, we proposed our method, Effective Multiclass Transfer Learning (EMTLe). We demonstrate that EMTLe, which uses the prediction of the source models as the transferable knowledge can exploit the knowledge of different types of source classifiers. We use the transfer parameter to weigh the importance the prediction of each source model as the auxiliary bias. Then we use the bi-level optimization to estimate the transfer parameter and demonstrate that we can effectively obtain the optimal transfer parameter with our novel objective function. Empirical results show that EMTLe can effectively exploit the knowledge and outperform other HTL baselines when the size of the target training set is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    13 classes include: backpack, bike, helmet, bottle, calculator, headphone, keyboard, laptop, monitor, mouse, mug, phone and projector.

References

  1. Aytar, Y., Zisserman, A.: Tabula rasa: model transfer for object category detection. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2252–2259. IEEE (2011)

    Google Scholar 

  2. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)

    Article  MathSciNet  Google Scholar 

  3. Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Adv. Neural Inf. Process. Syst. 19, 137 (2007)

    Google Scholar 

  4. Cawley, G.C.: Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1661–1668. IEEE (2006)

    Google Scholar 

  5. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)

    MATH  Google Scholar 

  6. Davis, J., Domingos, P.: Deep transfer via second-order Markov logic. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 217–224. ACM (2009)

    Google Scholar 

  7. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

    Article  Google Scholar 

  8. Jie, L., Tommasi, T., Caputo, B.: Multiclass transfer learning from unconstrained priors. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1863–1870. IEEE (2011)

    Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  10. Kuzborskij, I., Orabona, F.: Stability and hypothesis transfer learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 942–950 (2013)

    Google Scholar 

  11. Kuzborskij, I., Orabona, F., Caputo, B.: From n to n+1: multiclass transfer incremental learning. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3358–3365. IEEE (2013)

    Google Scholar 

  12. Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on Machine Learning (2015)

    Google Scholar 

  13. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  14. Pedregosa, F.: Hyperparameter optimization with approximate gradient. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 737–746 (2016)

    Google Scholar 

  15. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (2015)

    MATH  Google Scholar 

  16. Tommasi, T., Orabona, F., Caputo, B.: Learning categories from few examples with multi model knowledge transfer. Pattern Anal. Mach. Intell. 36(5), 928–941 (2014)

    Article  Google Scholar 

  17. Wang, X., Huang, T.K., Schneider, J.: Active transfer learning under model shift. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1305–1313 (2014)

    Google Scholar 

  18. Yang, J., Yan, R., Hauptmann, A.G.: Adapting SVM classifiers to data with shifted distributions. In: 2007 Seventh IEEE International Conference on Data Mining Workshops, ICDM Workshop 2007, pp. 69–76. IEEE (2007)

    Google Scholar 

  19. Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive SVMs. In: Proceedings of the 15th International Conference on Multimedia, pp. 188–197. ACM (2007)

    Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their valuable comments to improve this paper. This work is supported by Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles X. Ling .

Editor information

Editors and Affiliations

Appendix

Appendix

Theorem 1

Let \(L(\beta )\) be a \(\lambda \)-strongly convex function and \(\beta ^*\) be its optimal solution. Let \(\beta _1,...,\beta _{T+1}\) be a sequence such that \(\beta _1 \in B\) and for \(t>1\), we have \(\beta _{t+1} = \beta _t - \eta _t \varDelta _t\), where \(\varDelta _t\) is the sub-gradient of \(L(\beta _t)\) and \(\eta _t = 1/(\lambda t)\). Assume we have \(||\varDelta _t|| \le G\) for all t. Then we have:

$$\begin{aligned} L(\beta _{T+1}) \le L(\beta ^*)+\frac{G^2(1+\ln (T))}{2\lambda T} \end{aligned}$$
(9)

Proof

As \(L(\beta )\) is strongly convex and \(\varDelta _t\) is in its sub-gradient set at \(\beta _t\), according to the definition of \(\lambda \)-strong convexity [15], the following inequality holds:

$$\begin{aligned} \left\langle {\beta _t - \beta ^*,\varDelta _t} \right\rangle \ge L(\beta _t)-L(\beta ^*)+\frac{\lambda }{2}||\beta _t - \beta ^*||^2 \end{aligned}$$
(10)

For the term \(\left\langle {\beta _t - \beta ^*,\varDelta _y} \right\rangle \), it can be written as:

$$\begin{aligned} \begin{aligned} \left\langle {\beta _t - \beta ^*,\varDelta _t} \right\rangle&= \left\langle {\beta _t - \frac{1}{2}\eta _t\varDelta _t + \frac{1}{2}\eta _t\varDelta _t- \beta ^*,\varDelta _t} \right\rangle =\frac{1}{2}\left\langle {{\beta _{t + 1}} + {\beta _t} - 2{\beta ^*},{\varDelta _t}} \right\rangle + \frac{1}{2}{\eta _t}\varDelta _t^2 \end{aligned} \end{aligned}$$
(11)

Then we have:

$$\begin{aligned} \begin{aligned} ||\beta _t-\beta ^*||^2-||\beta _{t+1}-\beta ^*||^2 =\left\langle {{\beta _{t + 1}} + {\beta _t} - 2{\beta ^*},{\eta _t\varDelta _t}} \right\rangle \end{aligned} \end{aligned}$$
(12)

Using the assumption \(||\varDelta _t|| \le G\), we can rearrange (10) and plug (11) and (12) into it, we have:

$$\begin{aligned} \begin{aligned}&{Diff}_t = L(\beta _t)-L(\beta ^*) \le \frac{\lambda (t-1)}{2}{||{\beta _t} - {\beta ^*}||^2}- \frac{\lambda t}{2}{||{\beta _{t+1}} - {\beta ^*}||^2}+\frac{1}{2}{\eta _t} G^2 \end{aligned} \end{aligned}$$
(13)

Due to the convexity, for each pair of \(L(\beta _t)\) and \(L(\beta _{t+1})\) for \(t=1,...,T\), we have the following sequence \(L(\beta ^*) \le L(\beta _T) \le L(\beta _{T-1}) \le ...\le L(\beta _1)\). For the sequence \(Diff_t\) for \(t=1,...,T\), we have:

$$\begin{aligned} \sum _{t=1}^{T} Diff_t = \sum _{t=1}^{T}L(\beta _t)-TL(\beta ^*) \ge T\left[ L(\beta _T)-L(\beta ^*)\right] \end{aligned}$$
(14)

Next, we show that

$$\begin{aligned} \begin{aligned}&\sum _{t=1}^{T} Diff_t = \sum _{t=1}^{T}\left\{ \frac{\lambda (t-1)}{2}{||{\beta _t} - {\beta ^*}||^2}- \frac{\lambda t}{2}{||{\beta _{t+1}} - {\beta ^*}||^2}+\frac{1}{2}{\eta _t} G^2\right\} \\&=-\frac{\lambda T}{2}{||{\beta _{T+1}-\beta ^*}||^2} + \frac{G^2}{2 \lambda }\sum _{t=1}^{T} \frac{1}{t}\le \frac{G^2}{2 \lambda }\sum _{t=1}^{T} \frac{1}{t} \le \frac{G^2}{2 \lambda }(1+\ln (T)) \end{aligned} \end{aligned}$$
(15)

Combining (14) and rearranging the result, we have:

$$\begin{aligned} L(\beta _{T+1}) \le L(\beta ^*)+\frac{G^2(1+\ln (T))}{2\lambda T} \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ao, S., Li, X., Ling, C.X. (2017). Effective Multiclass Transfer for Hypothesis Transfer Learning. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57529-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57528-5

  • Online ISBN: 978-3-319-57529-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics