Skip to main content
Log in

Deep CockTail Networks

A Universal Framework for Visual Multi-source Domain Adaptation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Transferable deep representations for visual domain adaptation (DA) provides a route to learn from labeled source images to recognize target images without the aid of target-domain supervision. Relevant researches increasingly arouse a great amount of interest due to its potential industrial prospect for non-laborious annotation and remarkable generalization. However, DA presumes source images are identically sampled from a single source while Multi-Source DA (MSDA) is ubiquitous in the real-world. In MSDA, the domain shifts exist not only between source and target domains but also among the sources; especially, the multi-source and target domains may disagree on their semantics (e.g., category shifts). This issue challenges the existing solutions for MSDAs. In this paper, we propose Deep CockTail Network (DCTN), a universal and flexibly-deployed framework to address the problems. DCTN uses a multi-way adversarial learning pipeline to minimize the domain discrepancy between the target and each of the multiple in order to learn domain-invariant features. The derived source-specific perplexity scores measure how similar each target feature appears as a feature from one of source domains. The multi-source category classifiers are integrated with the perplexity scores to categorize target images. We accordingly derive a theoretical analysis towards DCTN, including the interpretation why DCTN can be successful without precisely crafting the source-specific hyper-parameters, and target expected loss upper bounds in terms of domain and category shifts. In our experiments, DCTNs have been evaluated on four benchmarks, whose empirical studies involve vanilla and three challenging category-shift transfer problems in MSDA, i.e., source-shift, target-shift and source-target-shift scenarios. The results thoroughly reveal that DCTN significantly boosts classification accuracies in MSDA and performs extraordinarily to resist negative transfers across different MSDA scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. More precisely, Saito et al. (2018) and Busto and Gall (2017) consider two different open-set problems.

  2. Since the domain discriminator hasn’t been trained, we take the uniform distribution simplex weight as the perplexity scores.

  3. Since each sample x corresponds to an unique class y, \(\{{\mathscr {P}}_{j}\}^M_{j=1}\) and \({\mathscr {P}}_t\) can be viewed as an equivalent embedding from \(\{P_{j}(x,y)\}^N_{j=1}\) and \(P_{t}(x,y)\) that we have discussed.

  4. http://www.sysu-hcp.net/deep-cocktail-network/.

References

  • Baktashmotlagh, M., Harandi, M., & Salzmann, M. (2016). Distribution-matching embedding for visual domain adaptation. The Journal of Machine Learning Research, 17(1), 3760–3789.

    MathSciNet  MATH  Google Scholar 

  • Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1), 151–175.

    Article  MathSciNet  Google Scholar 

  • Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2008). Learning bounds for domain adaptation. In Advances in neural information processing systems (pp. 129–136).

  • Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 95–104).

  • Busto, P. P., & Gall, J. (2017). Open set domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 754–763).

  • Cao, Z., Long, M., Wang, J., & Jordan, M. I. (2018). Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2724–2732).

  • Cao, Z., Ma, L., Long, M., & Wang, J. (2018). Partial adversarial domain adaptation. In Proceedings of the European conference on computer vision (pp. 139–155).

  • Cordts, M., Omran, M., Ramos, S., Scharwächter, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2015). The cityscapes dataset. In CVPR workshop on the future of datasets in vision.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248–255).

  • Duan, L., Xu, D., & Tsang, I. W. H. (2012). Domain adaptation from multiple sources: A domain-dependent regularization approach. IEEE Transactions on Neural Networks and Learning Systems, 23(3), 504–518.

    Article  Google Scholar 

  • Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE international conference on computer vision (pp. 2960–2967).

  • Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180–1189).

  • Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. (2017). Domain-adversarial training of neural networks. In Domain adaptation in computer vision applications (p. 189).

  • Gebru, T., Hoffman, J., & Fei-Fei, L. (2017). Fine-grained recognition in the wild: A multi-task domain adaptation approach. In Proceedings of the IEEE international conference on computer vision (pp. 1358–1367).

  • Ghifary, M., Kleijn, W. B., Zhang, M., Balduzzi, D., & Li, W. (2016). Deep reconstruction-classification networks for unsupervised domain adaptation. In Proceedings of the European conference on computer vision (pp. 597–613).

  • Gong, B., Grauman, K., & Sha, F. (2014). Learning kernels for unsupervised domain adaptation with applications to visual object recognition. International Journal of Computer Vision, 109(1–2), 3–27.

    Article  MathSciNet  Google Scholar 

  • Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2066–2073).

  • Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

  • Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In Proceedings of the IEEE international conference on computer vision (pp. 999–1006).

  • Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., & Smola, A. J. (2007). A kernel method for the two-sample-problem. In Advances in neural information processing systems (pp. 513–520).

  • Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., & Schölkopf, B. (2009). Covariate shift by kernel mean matching. Dataset Shift in Machine Learning, 3(4), 5.

    Google Scholar 

  • Haeusser, P., Frerix, T., Mordvintsev, A., & Cremers, D. (2017). Associative domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 2784–2792).

  • Ho, H. T., & Gopalan, R. (2014). Model-driven domain adaptation on product manifolds for unconstrained face recognition. International Journal of Computer Vision, 109(1–2), 110–125.

    Article  Google Scholar 

  • Hoffman, J., Wang, D., Yu, F., & Darrell, T. (2016). Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649

  • Jhuo, I. H., Liu, D., Lee, D., & Chang, S. F. (2013a). Robust visual domain adaptation with low-rank reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2168–2175).

  • Jhuo, I. H., Liu, D., Lee, D. T., & Chang, S. F. (2013b). Robust visual domain adaptation with low-rank reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2168–2175).

  • Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., & Girshick, R. B. (2017). CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1988–1997).

  • Kan, M., Wu, J., Shan, S., & Chen, X. (2014). Domain adaptation for face recognition: Targetize source domain bridged by common subspace. International Journal of Computer Vision, 109(1–2), 94–109.

    Article  Google Scholar 

  • Kim, Y., Cho, D., & Hong, S. (2020). Towards privacy-preserving domain adaptation. IEEE Signal Processing Letters, 27, 1675–1679.

    Article  Google Scholar 

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.

  • Koniusz, P., Tas, Y., & Porikli, F. (2017). Domain adaptation by mixture of alignments of second-or higher-order scatter tensors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7139–7148).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Liang, X., Xu, C., Shen, X., Yang, J., Tang, J., Lin, L., et al. (2016). Human parsing with contextualized convolutional neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 115–127.

    Article  Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).

  • Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).

  • Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2016). Unsupervised domain adaptation with residual transfer networks. In Advances in neural information processing systems (pp. 136–144).

  • Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2017). Deep transfer learning with joint adaptation networks. In Proceedings of the international conference on machine learning (pp. 2208–2217).

  • Lu, H., Zhang, L., Cao, Z., Wei, W., Xian, K., Shen, C., & van den Hengel, A. (2017). When unsupervised domain adaptation meets tensor representations. In Proceedings of the IEEE international conference on computer vision (pp. 599–608).

  • Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

  • Mancini, M., Porzi, L., Bulò, S. R., Caputo, B., & Ricci, E. (2018). Boosting domain adaptation by discovering latent domains. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3771–3780).

  • Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation with multiple sources. In Advances in neural information processing systems (pp. 1041–1048).

  • Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Smolley, S. P. (2017). Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2794–2802).

  • Motiian, S., Jones, Q., Iranmanesh, S. M., & Doretto, G. (2017). Few-shot adversarial domain adaptation. In Advances in neural information processing systems (pp. 6670–6680).

  • Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In Nips workshop on deep learning and unsupervised feature learning.

  • Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.

    Article  Google Scholar 

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

    Article  Google Scholar 

  • Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 1406–1415).

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).

  • Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European conference on computer vision (pp. 213–226).

  • Saito, K., Ushiku, Y., & Harada, T. (2017). Asymmetric tri-training for unsupervised domain adaptation. In Proceedings of the international conference on machine learning (pp. 2988–2997).

  • Saito, K., Yamamoto, S., Ushiku, Y., & Harada, T. (2018). Open set domain adaptation by backpropagation. In Proceedings of the European conference on computer vision (pp. 156–171).

  • Shao, M., Kit, D., & Fu, Y. (2014). Generalized transfer subspace learning through low-rank constraint. International Journal of Computer Vision, 109(1–2), 74–93.

    Article  MathSciNet  Google Scholar 

  • Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In AAAI conference on artificial intelligence (pp. 2058–2065).

  • Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2015). Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE international conference on computer vision (pp. 4068–4076).

  • Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2962–2971).

  • Xie, J., Hu, W., Zhu, S. C., & Wu, Y. N. (2015). Learning sparse frame models for natural image patterns. International Journal of Computer Vision, 114(2–3), 91–112.

    Article  MathSciNet  Google Scholar 

  • Xu, J., Ramos, S., Vázquez, D., & López, A. M. (2016). Hierarchical adaptive structural SVM for domain adaptation. International Journal of Computer Vision, 119(2), 159–178.

    Article  MathSciNet  Google Scholar 

  • Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).

  • Xu, R., Chen, Z., Zuo, W., Yan, J., & Lin, L. (2018). Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3964–3973).

  • Xu, R., Li, G., Yang, J., & Lin, L. (2019). Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 1426–1435).

  • Yan, H., Ding, Y., Li, P., Wang, Q., Xu, Y., & Zuo, W. (2017). Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 945–954).

  • Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In Proceedings of the ACM international conference on multimedia (pp. 188–197).

  • Yao, Y., Zhang, Y., Li, X., & Ye, Y. (2019). Heterogeneous domain adaptation via soft transfer network. In Proceedings of the 27th ACM international conference on multimedia (pp. 1578–1586).

  • You, K., Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2019). Universal domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2720–2729).

  • Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., & Samingerplatz, S. (2017). Central moment discrepancy (cmd) for domain-invariant representation learning. In International conference on learning representations.

  • Zhang, J., Li, W., & Ogunbona, P. (2017). Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5150–5158).

  • Zhang, S., Huang, J. B., Lim, J., Gong, Y., Wang, J., Ahuja, N., & Yang, M. H. (2019). Tracking persons-of-interest via unsupervised representation adaptation. International Journal of Computer Vision, 1–25.

  • Zhao, H., Zhang, S., Wu, G., Costeira, J. P., Moura, J. M. F., & Gordon, G. J. (2018). Multiple source domain adaptation with adversarial learning. In International conference on learning representations

Download references

Acknowledgements

This work was supported in part by NSFC (Nos. 62006253, U181146, 61836012, 61976233), State Key Development Program (No. 2018YFC0830103), Fundamental Research Funds for the Central Universities (No. 19lgpy228), and Major Project of Guangzhou Science and Technology of Collaborative Innovation and Industry under Grant 201605122151511. We also thank Ruijia Xu for his valuable suggestion to the revision.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengxu Wei.

Additional information

Communicated by Minsu Cho.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs

Proof

(Proof (Lemma 1)) Suppose the optimal feature extractor is \(F^*\), then M source-target adversarial learning pairs can be separately considered and correspond to the optimization objectives w.r.t. \(\{D_j\}^M_{j=1}\), respectively. Substitute \(F^*({\varvec{x}})\) by x, and there is

$$\begin{aligned} D^{*}_{j}(x) = \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{j}(x)+{\mathscr {P}}_{t}(x)} \end{aligned}$$
(22)

is exactly derived from Theorem.1 in Goodfellow et al. (2014).

$$\begin{aligned} \begin{aligned} h_t(x)&=\sum ^{M}_{j=1}\frac{-\log \left( \frac{{\mathscr {P}}_{t}(x)}{{\mathscr {P}}_{j}(x)+{\mathscr {P}}_{t}(x)}\right) h_j(x)}{\sum ^{M}_{k=1}-\log \left( \frac{{\mathscr {P}}_{t}(x)}{{\mathscr {P}}_{j}(x)+{\mathscr {P}}_{t}(x)}\right) }\\&=\sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) h_j(x)}{\sum ^{M}_{k=1}\log \left( 1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)}\right) }. \end{aligned} \end{aligned}$$
(23)

\(\square \)

Proof

(Proof (Proposition 2)) Given Lemma 1, it holds

$$\begin{aligned} \begin{aligned} h_t(x)&=\sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) }{\sum ^{M}_{k=1}\log \left( 1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)}\right) }h_j(x)\\&\le \sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) \ h_j(x)}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\le \sum ^{M}_{j=1}\frac{{\mathscr {P}}_{k}(x) \ h_j(x)}{{\mathscr {P}}_{t}(x)\sum ^{M}_{k=1}\log (1+ \alpha _k)}, \end{aligned} \end{aligned}$$
(24)

To this end, given a target feature x,

$$\begin{aligned}&{\mathscr {L}}({\mathscr {P}}_t,h_t,f)(x)\nonumber \\&\quad =L\left( \sum ^{M}_{j=1}\frac{\log (1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)})}{\sum ^{M}_{k=1}\log \left( 1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)}\right) }h_j(x),f(x)\right) {\mathscr {P}}_t(x)\nonumber \\&\quad \le \sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) }{\sum ^{M}_{k=1}\log (1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)})} {\mathscr {P}}_t(x)L\left( h_j(x),f(x)\right) \nonumber \\&\quad \le \sum ^{M}_{j=1}\frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)\sum ^{M}_{k=1}\log (1+ \alpha _k)} {\mathscr {P}}_t(x)L\left( h_j(x),f(x)\right) \nonumber \\&\quad = \sum ^{M}_{j=1}\frac{{\mathscr {P}}_{j}(x)}{\sum ^{M}_{k=1}\log (1+ \alpha _k)} L\left( h_j(x),f(x)\right) \end{aligned}$$
(25)

in which the first inequality is derived from the convexity of the loss function \(L(\cdot ,\cdot )\). \(\square \)

Proof

(Proposition 3) In terms of Proposition 1, we provide the upper bound of \({\mathscr {L}}({\mathscr {P}}_t,h_t,f)\). Specifically,

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&=\int _{x\in {\mathscr {X}}}{\mathscr {L}}({\mathscr {P}}_t,h_t,f)(x)dx\\&\le \int _{x\in {\mathscr {X}}}\sum ^{M}_{j=1}\frac{{\mathscr {P}}_{j}(x)}{\sum ^{M}_{k=1}\log (1+ \alpha _k)} L\big (h_j(x),f(x)\big )dx\\&=\frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\sum ^{M}_{j=1}\\&\quad \int _{x\in {\mathscr {X}}}{\mathscr {P}}_{j}(x) L\big (h_j(x),f(x)\big )dx \end{aligned} \end{aligned}$$
(26)

Since \(\rho \) indicates the proportion of wrongly-labeled target data by the auto-annotating strategy in the discriminative adaptation phase; \(f'(x)\) represents the wrong target function w.r.t. x, namely, \(\forall x\in {\mathscr {X}}\), it holds \(f'(x)\ne f(x)\) and \(L\big (h_j(x),f(x)\big )\le L\big (h_j(x),f'(x)\big )\). Therefore,

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}\int _{x\in {\mathscr {X}}} L\big (h_j(x),f(x)\big )dx\\&= \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}\Bigg (\int _{x\in {\mathscr {X}}}{\mathscr {P}}_{j}(x)\bigg ((1-\rho ) L\big (h_j(x),f(x)\big )\\&\quad +\rho L\big (h_j(x),f(x)\big )\bigg )dx\Bigg ),\\&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}\Bigg (\int _{x\in {\mathscr {X}}}{\mathscr {P}}_{j}(x)\bigg ((1-\rho ) L\big (h_j(x),f(x)\big )\\&\quad +\rho L\big (h_j(x),f'(x)\big )\bigg )dx\Bigg ),\\ \end{aligned} \end{aligned}$$
(27)

Due to the assumption of the 0-1 loss function on L, it follows the analysis in Saito et al. (2017) and holds

$$\begin{aligned} \begin{aligned}&\int _{x\in {\mathscr {X}}} {\mathscr {P}}_j(x)L\big (h_j(x),f(x)\big )dx\\&\quad \le \int _{x\in {\mathscr {X}}} {\mathscr {P}}_j(x)L\big (h_j(x),f'(x)\big )dx\\&\quad \le \int _{x\in {\mathscr {X}}} {\mathscr {P}}_j(x)dx = 1. \end{aligned} \end{aligned}$$
(28)

so that

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\big ((1-\rho )\\&\quad {\sum _{j\in [M]}\epsilon _j}+M\rho \big ) \end{aligned} \end{aligned}$$
(29)

Conclude the proof. \(\square \)

Proof

(Proposition 4) According to the results of Proposition.21, we have

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)(x)&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}{\mathscr {P}}_{j}(x) L\big (h_j(x),f(x)\big ). \end{aligned} \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&=\frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \int _{x\in {\mathscr {X}}}\sum ^{M}_{j=1}{\mathscr {P}}_{j}(x) L\big (h_j(x),f(x)\big )(x)dx\\&\le \frac{1}{\sum ^{M}_{k=1} \log (1+ \alpha _k)}\\&\quad \sum _{j\in [M]}\Bigg ((1-\rho ')\underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_j(x)L\big (h_j(x),f(x)\big )dx\\&\quad \underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_j(x)L\big (h_j(x),f'(x)\big )dx\Bigg )\\&\le \frac{1}{\sum ^{M}_{k=1} \log (1+ \alpha _k)}\\&\quad \sum _{j\in [M]} \Bigg ( (1-\rho ')\bigg ( (1-\rho )\underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f(x)\big )dx\\&\quad +\rho \underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f'(x)\big )dx \bigg )\\&\quad +\rho '\underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f'(x)\big )dx\Bigg ), \end{aligned} \end{aligned}$$
(30)

where the first, third and fifth inequalities are derived from the proof of Proposition 2; the second inequality is developed from the entropy-based unknown category discovery strategy, which is specified in target-category-shift and source-target-category-shift scenarios (Notice that, since the unknown class discovery is executed ahead of the pseudo-labeling strategy, it makes the the inequality w.r.t. \(\rho \) nested in the inequality w.r.t. \(\rho '\)); the fourth inequality is derived from the 0–1 loss function upper bound discussed in Saito et al. (2017) .

Due to the assumption of the 0–1 loss function on L, it follows (Saito et al. 2017) and holds

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&\le \sum _{j\in [M]}\frac{1}{\sum ^{M}_{k=1} \log (1+ \alpha _k)}\Big ((1-\rho ')(1-\rho )\\&\quad \underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f(x)\big )dx+\rho (1-\rho ')+\rho '\Big )\\&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\Big ((1\,\, -\,\, \rho ')(1\,\, -\,\, \rho ){\sum _{j\in [M]}\epsilon _j}\,\,\\&\quad +\big (\,\, (1\,\, -\,\, \rho ')\rho +\rho '\big )M\Big ) \end{aligned} \end{aligned}$$
(31)

\(\square \)

Appendix B: Implementation details

The setup of \(\gamma \) and \(\zeta \) The the pseudo labeling strategies of DCTN rely on hyper-parameters \(\gamma \) and \(\zeta \). The threshold \(\gamma \) is leveraged to select a part of target candidates, which are annotated as “high-confident” and augmented with multi-source examples to train the multi-source classifiers. We set the value over 90% to ensure the quality of selecte target samples. Instead of choosing a specific threshold \(\gamma \), we rank the target examples according to their entropy values on the source-specific classifiers by a monotonically decreasing order, then choose the top 15% as the “unknown” candidates: 300/120/140 as the “unknown” candidates in A/D/W domains, respectively. This manner promises adequate “unknown” examples to train a reliable classifier for each source domain. The schemes is adopted the same in the experiment of DomainNet (Details see Table 7).

Network Implementation details For the recognitions in Office-31 and ImageCLEF-DA, existing deep DA approaches (Long et al. 2015, 2016) routinely employ Alexnet (Krizhevsky et al. 2012) as their backbones. For a fair comparison, we choose a DCTN architecture deriving from the Alexnet pipeline. As Fig. 11 illustrated, the representation module F is designed as a five-layer fully-convolutional network with three max-pooling operators, and the (multi-source) category classifier C is a three-layer fully-connected multi-task network. They are stacked into an exactly Alexnet-like pipeline to categorize examples. We adopt a CNN with a two-head classifier as domain discriminator D.

For the sake of legibility, we apply the sigmoid cross entropy loss to denote the multi-way adversarial learning inducing the perplexity score in our paper. Under M adversarial adaptation context, this loss function leads to the gradient vanishing and behaves extremely unstable during training. To overcome this issue, we replace it with the least square measure (Mao et al. 2017) in practice to ensure robust adversarial learning:

$$\begin{aligned} \begin{aligned} {\mathscr {L}}^{(ls)}_{adv}(F, D)&= \frac{1}{M}\sum _{j}^{M}{\mathbb {E}}_{{\varvec{x}}\sim {\mathbf {X}}_{j}}[(D_{j}(F({\varvec{x}})))^2] \\&\quad + {\mathbb {E}}_{{\varvec{x}}^{(t)}\sim {\mathbf {X}}_{t}}[(1- D_{j}(F({\varvec{x}}^{(t)})))^2].\\ \end{aligned} \end{aligned}$$
(32)

Accordingly, the confusion loss has been revised as

$$\begin{aligned} \ \begin{aligned} {\mathscr {L}}^{(ls)}_{cf}({\varvec{x}};F,D_{j}) = \left( D_{j}(F({\varvec{x}}))-\frac{1}{2}\right) ^2. \end{aligned} \end{aligned}$$
(33)

Then given a target instance \({\varvec{x}}^{(t)}\), a least square perplexity score is

$$\begin{aligned} \begin{aligned} s({\varvec{x}}^{(t)};F,D_{j}) = (D_{j}(F({\varvec{x}}^{(t)})))^2. \end{aligned} \end{aligned}$$
(34)
Fig. 11
figure 11

The representation module, domain discriminator and category classifier we used in the experiments about object recognition. (Best viewed in color)

Fig. 12
figure 12

The representation module, domain discriminator and category classifier we used in the experiments about digit recognition. (Best viewed in color)

The implementation keeps consistent with all our analysis in the paper. No matter in training or test, we need a perplexity score weighting scheme to predict the class of the target instance. While in the adversarial learning process, the domain discriminator D must be gradually trained to accommodate the learning of feature extractor F. It means that in those previous epoches, the perplexity scores are not capable of providing reliable probablistic relations between target and each source. This hurts the pseudo-labeling scheme and further spoils the adversary at the next alternative step. Empirically, this negative effect mostly attributes to the unstable predictions to target instances. Hence we utilize the moving average to calculate the perpelxity score for each target instance.

$$\begin{aligned} \begin{aligned} s({\varvec{x}}_{N_T}^{(t)};F,D_{j}) = \frac{1}{N_{T}}\sum _{i}^{N_T}(D_{j}(F({\varvec{x}}_i^{(t)})))^2, \end{aligned} \end{aligned}$$
(35)

where \(N_T\) denotes the number of times that the target samples have been visited to train our model (one target mini-batch as the measurement unit); \(x_{N_T}^t\) denotes the current target instance being considered.

Hyper-parameter setting of training In visual object recognition experiments (Office-31 and ImageCLEF), we initiate our DCTN by following the same way of DAN (Long et al. 2015). In terms of digit recognition, DCTN learns from scratch. In order to execute online hard domain mining, we construct our mini-batch by sampling an equal number of images per domain. For instance, consider a case of two-source domain adaptation with a domain batch size of 32. Then we have mini-batches with the sizes as \(96 = 32 \times (2+1)\) (2 and 1 denote two source domains and one target domain). In this situation, the length of one epoch is decided by the size of the domain containing most instances. Finally, we adopt Adam (Kingma and Ba 2015) solver with \(momentum = (0.9, 0.99) \) in all experiments to update our networks (Fig. 12).

Table 7 The hyper-parameters setting in our experiment

More hyper-parameter details are shown in Table 7.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Wei, P., Zhuang, J. et al. Deep CockTail Networks. Int J Comput Vis 129, 2328–2351 (2021). https://doi.org/10.1007/s11263-021-01463-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01463-x

Keywords

Navigation