Abstract
In this paper, we investigate the visual domain adaptation problem under the setting of Hypothesis Transfer Learning (HTL) where we can only access the source model instead of the data. However, previous studies of HTL are limited to either leveraging the knowledge from certain type of source classifier or low transfer efficiency on a small training set. In this paper, we aim at two important issues: effectiveness of the transfer on small target training set and compatibility of the transfer model for real-world HTL problems. To solve these two issues, we proposed our method, Effective Multiclass Transfer Learning (EMTLe). We demonstrate that EMTLe, which uses the prediction of the source models as the transferable knowledge can exploit the knowledge of different types of source classifiers. We use the transfer parameter to weigh the importance the prediction of each source model as the auxiliary bias. Then we use the bi-level optimization to estimate the transfer parameter and demonstrate that we can effectively obtain the optimal transfer parameter with our novel objective function. Empirical results show that EMTLe can effectively exploit the knowledge and outperform other HTL baselines when the size of the target training set is small.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
13 classes include: backpack, bike, helmet, bottle, calculator, headphone, keyboard, laptop, monitor, mouse, mug, phone and projector.
References
Aytar, Y., Zisserman, A.: Tabula rasa: model transfer for object category detection. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2252–2259. IEEE (2011)
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Adv. Neural Inf. Process. Syst. 19, 137 (2007)
Cawley, G.C.: Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1661–1668. IEEE (2006)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
Davis, J., Domingos, P.: Deep transfer via second-order Markov logic. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 217–224. ACM (2009)
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Jie, L., Tommasi, T., Caputo, B.: Multiclass transfer learning from unconstrained priors. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1863–1870. IEEE (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kuzborskij, I., Orabona, F.: Stability and hypothesis transfer learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 942–950 (2013)
Kuzborskij, I., Orabona, F., Caputo, B.: From n to n+1: multiclass transfer incremental learning. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3358–3365. IEEE (2013)
Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Pedregosa, F.: Hyperparameter optimization with approximate gradient. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 737–746 (2016)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (2015)
Tommasi, T., Orabona, F., Caputo, B.: Learning categories from few examples with multi model knowledge transfer. Pattern Anal. Mach. Intell. 36(5), 928–941 (2014)
Wang, X., Huang, T.K., Schneider, J.: Active transfer learning under model shift. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 1305–1313 (2014)
Yang, J., Yan, R., Hauptmann, A.G.: Adapting SVM classifiers to data with shifted distributions. In: 2007 Seventh IEEE International Conference on Data Mining Workshops, ICDM Workshop 2007, pp. 69–76. IEEE (2007)
Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive SVMs. In: Proceedings of the 15th International Conference on Multimedia, pp. 188–197. ACM (2007)
Acknowledgments
We thank the anonymous reviewers for their valuable comments to improve this paper. This work is supported by Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Theorem 1
Let \(L(\beta )\) be a \(\lambda \)-strongly convex function and \(\beta ^*\) be its optimal solution. Let \(\beta _1,...,\beta _{T+1}\) be a sequence such that \(\beta _1 \in B\) and for \(t>1\), we have \(\beta _{t+1} = \beta _t - \eta _t \varDelta _t\), where \(\varDelta _t\) is the sub-gradient of \(L(\beta _t)\) and \(\eta _t = 1/(\lambda t)\). Assume we have \(||\varDelta _t|| \le G\) for all t. Then we have:
Proof
As \(L(\beta )\) is strongly convex and \(\varDelta _t\) is in its sub-gradient set at \(\beta _t\), according to the definition of \(\lambda \)-strong convexity [15], the following inequality holds:
For the term \(\left\langle {\beta _t - \beta ^*,\varDelta _y} \right\rangle \), it can be written as:
Then we have:
Using the assumption \(||\varDelta _t|| \le G\), we can rearrange (10) and plug (11) and (12) into it, we have:
Due to the convexity, for each pair of \(L(\beta _t)\) and \(L(\beta _{t+1})\) for \(t=1,...,T\), we have the following sequence \(L(\beta ^*) \le L(\beta _T) \le L(\beta _{T-1}) \le ...\le L(\beta _1)\). For the sequence \(Diff_t\) for \(t=1,...,T\), we have:
Next, we show that
Combining (14) and rearranging the result, we have:
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ao, S., Li, X., Ling, C.X. (2017). Effective Multiclass Transfer for Hypothesis Transfer Learning. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-57529-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)