Skip to main content
Log in

Asymmetric and Category Invariant Feature Transformations for Domain Adaptation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

-1We address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce a unified flexible model for both supervised and semi-supervised learning that allows us to learn transformations between domains. Additionally, we present two instantiations of the model, one for general feature adaptation/alignment, and one specifically designed for classification. First, we show how to extend metric learning methods for domain adaptation, allowing for learning metrics independent of the domain shift and the final classifier used. Furthermore, we go beyond classical metric learning by extending the method to asymmetric, category independent transformations. Our framework can adapt features even when the target domain does not have any labeled examples for some categories, and when the target and source features have different dimensions. Finally, we develop a joint learning framework for adaptive classifiers, which outperforms competing methods in terms of multi-class accuracy and scalability. We demonstrate the ability of our approach to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types, and codebooks. The experiments show its strong performance compared to previous approaches and its applicability to large-scale scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Note that in general we could equally optimize a second loss function between the source and target data which considers instance level constraints. However, to distinguish ourselves from prior work which focused on learning a metric requiring instance constraints, we present our algorithms assuming only category level information to demonstrate the effectiveness of using only this coarser level of supervision.

  2. Note that we present this result for the specific case of using the Frobenius norm regularizer, though in fact our analysis holds for the class of regularizers \(r({\varvec{W}})\) that can be written in terms of the singular values of \({\varvec{W}}\); that is, if \(\sigma _1, \ldots , \sigma _p\) are the singular values of \({\varvec{W}}\), then \(r({\varvec{W}})\) is of the form \(\sum _{j=1}^p r_j(\sigma _j)\) for some scalar functions \(r_j\), which is globally minimized by zero. For example, the squared Frobenius norm \(r({\varvec{W}}) = \frac{1}{2} \Vert {\varvec{W}}\Vert _F^2\) is a special case where \(r_j(\sigma _j) = \frac{1}{2} \sigma _j^2\).

  3. The assumption that the kernel matrices are strictly positive definite is not a severe limitation. For the Gaussian RBF kernel, strict positive definiteness can always be assured and for other kernel functions, the matrices can be regularized by adding a scaled identity matrix.

References

  • Argyriou, A., Micchelli, C. A., & Pontil, M. (2010). On spectral learning. Journal of Machine Learning Research, 11, 935–953.

    MATH  MathSciNet  Google Scholar 

  • Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In Proceedings of the international conference on computer vision (ICCV) (pp. 2252–2259).

  • Ben-david, S., Blitzer, J., Crammer, K., & Pereira, O. (2007). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS) ( pp. 137–145). Cambridge: MIT Press.

  • Bergamo, A., & Torresani, L. (2010). Exploiting weakly-labeled web images to improve object classification: A domain adaptation approach. In Advances in neural information processing systems (NIPS) (pp. 181–189).

  • Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. ACL, 7, 440–447.

    Google Scholar 

  • Chopra, S., Balakrishnan, S., & Gopalan, R. (2013). Dlid: Deep learning for domain adaptation by interpolating between domains. In ICML workshop on challenges in representation learning.

  • Dai, W., Chen, Y., Xue, G., Yang, Q., & Yu, Y. (2008). Translated learning: Transfer learning across different feature spaces. In Advances in neural information processing systems (NIPS) (pp. 353–360).

  • Daume III, H. (2007). Frustratingly easy domain adaptation. In ACL (pp. 256–263).

  • Davis, J., Kulis, B., Jain, P., Sra, S., & Dhillon, I. (2007). Information-theoretic metric learning. In Proceedings of the international conference on Machine learning (ICML) (pp. 209–216) .

  • Diethe, T., Hardoon, D. R., & Shawe-Taylor, J. (2010). Constructing nonlinear discriminants from multiple data views. In Machine learning and knowledge discovery in databases (pp. 328–343) Berlin: Springer.

  • Donahue, J., Hoffman, J., Rodner, E., Saenko, K., & Darrell, T. (2013). Semi-supervised domain adaptation with instance constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 668–675).

  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In International conference in machine learning (ICML).

  • Duan, L., Tsang, I. W., Xu, D., & Maybank, S. J. (2009). Domain transfer svm for video concept detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1375–1381).

  • Duan, L., Xu, D., & Tsang, I. W. (2012a). Learning with augmented features for heterogeneous domain adaptation. In Proceedings of the international conference on machine learning (pp. 711–718).

  • Duan, L., Xu, D., Tsang, I. W. H., & Luo, J. (2012b). Visual event recognition in videos by learning from web data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1667–1680.

    Article  Google Scholar 

  • Farhadi, A., & Tabrizi, M. K. (2008). Learning to recognize activities from the wrong view point. In Proceedings of the European conference on computer vision (ECCV) (pp. 154–166).

  • Farquhar, J., Hardoon, D., Meng, H., Shawe-taylor, J. S., & Szedmak, S. (2005). Two view learning: Svm-2k, theory and practice. In Advances in neural information processing systems (NIPS) (pp. 355–362).

  • Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2066–2073).

  • Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In Proceedings of the international conference on computer vision (ICCV) (pp. 999–1006).

  • Hoffman, J., Rodner, E., Donahue, J., Saenko, K., & Darrell, T. (2013). Efficient learning of domain-invariant image representations. In International conference on learning representations (ICLR). http://arxiv.org/abs/1301.3224

  • Jhuo, I. H., Liu, D., Chang, S. F., & Lee, D. T. (2012). Robust visual domain adaptation with low-rank reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2168–2175).

  • Jiang, J. (2008). A literature survey on domain adaptation of statistical classifiers. http://sifaka.cs.uiuc.edu/jiang4/domain_adaptation/survey/.

  • Jiang, J., & Zhai, C. X. (2007). Instance weighting for domain adaptation in NLP. In ACL (pp. 264–271).

  • Jiang, W., Zavesky, E., Chang, S., & Loui, A. (2008). Cross-domain learning methods for high-level visual concept classification. In International conference on image processing (ICIP) (pp. 161–164).

  • Kan, M., Shan, S., Zhang, H., Lao, S., & Chen, X. (2012). Multi-view discriminant analysis. In Proceedings of the European computer vision conference (ECCV) (pp. 808–821). Berlin: Springer.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.

  • Kulis, B., Jain, P., & Grauman, K. (2009). Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2143–2157.

    Article  Google Scholar 

  • Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1785–1792).

  • Li, R., & Zickler, T. (2012). Discriminative virtual views for cross-view action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2855–2862).

  • Li, X. (2007). Regularized adaptation: Theory, algorithms and applications. Ph.D. thesis, USA: University of Washington

  • Quadrianto, N., & Lampert, C. H. (2011). Learning multi-view neighborhood preserving projections. In Proceedings of the International Conference on Machine Learning (ICML) (pp. 425–432).

  • Rodner, E., Hoffman, J., Donahue, J., Darrell, T., Saenko, K. (2013). Towards adapting imagenet to reality: Scalable domain adaptation with implicit low-rank transformations. arXiv:1308.4200 (preprint).

  • Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 213–226).

  • Sharma, A., Kumar, A., Daume, H., & Jacobs, D. W. (2012). Generalized multiview analysis: A discriminative latent space. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2160–2167).

  • Torralba, A., & Efros, A. (2011). Unbiased look at dataset bias. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1521–1528).

  • Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In ACM Multimedia (pp 188–197).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Judy Hoffman.

Additional information

Communicated by Hal Daumé.

Appendix: Proofs

Appendix: Proofs

Proof of Lemma 1

Let \({\varvec{W}}\) have singular value decomposition \(\varvec{U} \varvec{\Sigma } \varvec{\tilde{U}}^T\). We can therefore write \({\varvec{W}}\) as \({\varvec{W}}= \sum _{j=1}^p \sigma _j \varvec{u}_j \tilde{\varvec{u}}_j^T\), where \(p\) is the rank of \({\varvec{W}}\).

Claim: \(\varvec{u_j}\in \mathcal {C}(\mathbf{X}), \tilde{\varvec{u_j}}\in \mathcal {C}(\mathbf{Z})\) such that there exists vectors, \(\varvec{v_j}, \tilde{\varvec{v_j}}\) where \(\varvec{u_j} = \mathbf{X}\varvec{v_j}\) and \(\varvec{\tilde{u_j}} = \mathbf{Z}\varvec{\tilde{v_j}}\).

Proof: Let us consider what would happen if this were not true. By definition if a vector is not in the column space of the matrix then it is in the left-null space of that matrix. Namely, if \(\varvec{u_j} \notin \mathcal {C}(\mathbf{X})\) then \({\mathbf{X}}^T\varvec{u_j} = 0\) and similarly if \(\varvec{\tilde{u_j}} \notin \mathcal {C}(\mathbf{Z})\) then \({\mathbf{Z}}^T\varvec{\tilde{u_j}} = 0\). Now, consider that the constraints in the optimization problem we are solving only consider \({\varvec{W}}\) in terms of the similarity function \(\hbox {sim}({\varvec{W}}, \mathbf{X}, \mathbf{Z}) = {\mathbf{X}}^T {\varvec{W}}\mathbf{Z}= \sum _{j=1}^p \sigma _j {\mathbf{X}}^T\varvec{u_j} \tilde{\varvec{u_j}}^T \mathbf{Z}\). If either \(\varvec{u_j} \notin \mathcal {C}(\mathbf{X})\) or \(\varvec{\tilde{u_j}} \notin \mathcal {C}(\mathbf{Z})\) then the corresponding element in the sum would equal zero (since the global minimizer of each \(r_j(\sigma _j)\) is assumed to be zero) and the \(j^{th}\) singular value would be left unconstrained and hence automatically set to zero by the regularizer. Therefore, if a singular value \(\sigma _j \ne 0\) then we know that the corresponding singular vectors are in the column space of source and target data.

Following the above claim, let \(\varvec{v_j}, \varvec{\tilde{v_j}}\) be the vectors such that \(\varvec{u_j} = \mathbf{X}\varvec{v_j}\) and \(\varvec{\tilde{u_j}} = \mathbf{Z}\varvec{\tilde{v_j}}\). Then we can re-write \({\varvec{W}}\) as follows:

$$\begin{aligned} {\varvec{W}}&= \sum _{j=1}^t \sigma _j \varvec{u}_j \tilde{\varvec{u}}_j^T = \sum _{j=1}^t \sigma _j \mathbf{X}\varvec{v}_j \tilde{\varvec{v}}_j^T {\mathbf{Z}}^T\\&= \mathbf{X}\bigg (\sum _{j=1}^t \sigma _j \varvec{v}_j \tilde{\varvec{v}}_j^T \bigg ) {\mathbf{Z}}^T = \mathbf{X}\varvec{\tilde{L}} {\mathbf{Z}}^T, \end{aligned}$$

where \(\varvec{\tilde{L}} = \sum _{j=1}^t \sigma _j \varvec{v}_j \tilde{\varvec{v}}_j^T\). With the transformation \(\varvec{L} = \varvec{K}_{\mathcal {X}}^{1/2} \varvec{\tilde{L}} \varvec{K}_{\mathcal {Z}}^{1/2}\), we can equivalently write

\({\varvec{W}}= \mathbf{X}\varvec{K}_{\mathcal {X}}^{-1/2} \varvec{L} \varvec{K}_{\mathcal {Z}}^{-1/2} {\mathbf{Z}}^T\), which proves the lemma and will simplify the theorem proof.

Proof of Theorem 1

Denote \(\varvec{V}_{\mathcal {X}} = \mathbf{X}\varvec{K}_{\mathcal {X}}^{-1/2}\) and \(\varvec{V}_{\mathcal {Z}} = \mathbf{Z}\varvec{K}_{\mathcal {Z}}^{-1/2}\). Note that \(\varvec{V}_{\mathcal {X}}\) and \(\varvec{V}_{\mathcal {Z}} \) are orthogonal matrices. From the lemma, \({\varvec{W}}= \varvec{V}_{\mathcal {X}} \varvec{L} \varvec{V}_{\mathcal {Z}}^T\); let \(\varvec{V}^{\perp }_{\mathcal {X}} \) and \(\varvec{V}^{\perp }_{\mathcal {Z}}\) be the orthogonal complements to \(\varvec{V}_{\mathcal {X}}\) and \(\varvec{V}_{\mathcal {Z}}\), and let \(\bar{\varvec{V}}_{\mathcal {X}} = [\varvec{V}_{\mathcal {X}} ~~\varvec{V}^{\perp }_{\mathcal {X}} ]\) and \(\bar{\varvec{V}}_{\mathcal {Z}} = [\varvec{V}_{\mathcal {Z}} ~~\varvec{V}^{\perp }_{\mathcal {Z}} ]\). Then

$$\begin{aligned} r \bigg ( \bar{\varvec{V}}_{\mathcal {X}} \left[ \begin{array}{l@{\quad }l} \varvec{L} &{} 0\\ 0 &{} 0 \end{array}\right] \bar{\varvec{V}}_{\mathcal {Z}}^T \bigg )&= r \bigg ( \left[ \begin{array}{l@{\quad }l} {\varvec{W}}&{} 0\\ 0 &{} 0 \end{array}\right] \bigg ) = r(\varvec{W}) + r(0)\\&= r({\varvec{W}}) + \text {const}. \end{aligned}$$

One can easily verify that, given two orthogonal matrices \(\varvec{V}_1\) and \(\varvec{V}_2\) and an arbitrary matrix \(\varvec{M}, r(\varvec{V}_1 \varvec{M} \varvec{V}_2) = \sum _j r_j(\sigma _j)\) if \(\sigma _j\) are the singular values of \(\varvec{M}\). So

$$\begin{aligned} r \bigg ( \bar{\varvec{V}}_{\mathcal {X}} \left[ \begin{array}{ll} \varvec{L} &{} 0\\ 0 &{} 0 \end{array}\right] \bar{\varvec{V}}_{\mathcal {Z}}^T \bigg ) = \sum _j r_j(\bar{\sigma _j}) + \text {const} = r(\varvec{L}) + \text {const}, \end{aligned}$$

where \(\bar{\sigma _i}\) are the singular values of \(\varvec{L}\). Thus, \(r(\varvec{W}) = r(\varvec{L}) + \text {const}\).

Finally, rewrite the similarity values using the previously derived kernel representation of the transformation matrix \({\varvec{W}}= \mathbf{X}\varvec{K}_{\mathcal {X}}^{-1/2} \varvec{L} \varvec{K}_{\mathcal {Z}}^{-1/2} {\mathbf{Z}}^T\):

$$\begin{aligned} \hbox {sim}({\varvec{W}}, \mathbf{X}, \mathbf{Z})&= {\mathbf{X}}^T {\varvec{W}}\mathbf{Z}= \varvec{K}_{\mathcal {X}} \varvec{K}_{\mathcal {X}}^{-1/2} \varvec{L} \varvec{K}_{\mathcal {Z}}^{-1/2} \varvec{K}_{\mathcal {Z}}\\&= \varvec{K}_{\mathcal {X}}^{1/2} \varvec{L} \varvec{K}_{\mathcal {Z}}^{1/2} =\hbox {sim}({\varvec{L}}, { \varvec{K}_{\mathcal {X}}^{1/2}}^T, \varvec{K}_{\mathcal {Z}}^{1/2}) \end{aligned}$$

The theorem follows by rewriting \(r\) and the constraints \(c_{{\varvec{W}}}\) using the above derivations in terms of \(\varvec{L}\). Note that both \(r({\varvec{W}})\) and the \(c_{{\varvec{W}}}\) can be computed independently of the dimension of \({\varvec{W}}\), so simple arguments show that the optimization may be solved in polynomial time independent of the dimension when the \(r_j\) functions are convex.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hoffman, J., Rodner, E., Donahue, J. et al. Asymmetric and Category Invariant Feature Transformations for Domain Adaptation. Int J Comput Vis 109, 28–41 (2014). https://doi.org/10.1007/s11263-014-0719-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0719-3

Keywords

Navigation