Deep CockTail Networks

Chen, Ziliang; Wei, Pengxu; Zhuang, Jingyu; Li, Guanbin; Lin, Liang

doi:10.1007/s11263-021-01463-x

Deep CockTail Networks

A Universal Framework for Visual Multi-source Domain Adaptation

Published: 20 May 2021

Volume 129, pages 2328–2351, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Ziliang Chen^1,2,
Pengxu Wei ORCID: orcid.org/0000-0002-2190-0767¹,
Jingyu Zhuang¹,
Guanbin Li¹ &
…
Liang Lin¹

976 Accesses
4 Citations
Explore all metrics

Abstract

Transferable deep representations for visual domain adaptation (DA) provides a route to learn from labeled source images to recognize target images without the aid of target-domain supervision. Relevant researches increasingly arouse a great amount of interest due to its potential industrial prospect for non-laborious annotation and remarkable generalization. However, DA presumes source images are identically sampled from a single source while Multi-Source DA (MSDA) is ubiquitous in the real-world. In MSDA, the domain shifts exist not only between source and target domains but also among the sources; especially, the multi-source and target domains may disagree on their semantics (e.g., category shifts). This issue challenges the existing solutions for MSDAs. In this paper, we propose Deep CockTail Network (DCTN), a universal and flexibly-deployed framework to address the problems. DCTN uses a multi-way adversarial learning pipeline to minimize the domain discrepancy between the target and each of the multiple in order to learn domain-invariant features. The derived source-specific perplexity scores measure how similar each target feature appears as a feature from one of source domains. The multi-source category classifiers are integrated with the perplexity scores to categorize target images. We accordingly derive a theoretical analysis towards DCTN, including the interpretation why DCTN can be successful without precisely crafting the source-specific hyper-parameters, and target expected loss upper bounds in terms of domain and category shifts. In our experiments, DCTNs have been evaluated on four benchmarks, whose empirical studies involve vanilla and three challenging category-shift transfer problems in MSDA, i.e., source-shift, target-shift and source-target-shift scenarios. The results thoroughly reveal that DCTN significantly boosts classification accuracies in MSDA and performs extraordinarily to resist negative transfers across different MSDA scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Connor Shorten & Taghi M. Khoshgoftaar

Learning to Prompt for Vision-Language Models

Article 31 July 2022

Kaiyang Zhou, Jingkang Yang, … Ziwei Liu

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Article 04 April 2024

Qi Fan, Wei Zhuo, … Yu-Wing Tai

Notes

More precisely, Saito et al. (2018) and Busto and Gall (2017) consider two different open-set problems.
Since the domain discriminator hasn’t been trained, we take the uniform distribution simplex weight as the perplexity scores.
Since each sample x corresponds to an unique class y, $\{{\mathscr {P}}_{j}\}^M_{j=1}$ and ${\mathscr {P}}_t$ can be viewed as an equivalent embedding from $\{P_{j}(x,y)\}^N_{j=1}$ and $P_{t}(x,y)$ that we have discussed.
http://www.sysu-hcp.net/deep-cocktail-network/.

References

Baktashmotlagh, M., Harandi, M., & Salzmann, M. (2016). Distribution-matching embedding for visual domain adaptation. The Journal of Machine Learning Research, 17(1), 3760–3789.
MathSciNet MATH Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1), 151–175.
Article MathSciNet Google Scholar
Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2008). Learning bounds for domain adaptation. In Advances in neural information processing systems (pp. 129–136).
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 95–104).
Busto, P. P., & Gall, J. (2017). Open set domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 754–763).
Cao, Z., Long, M., Wang, J., & Jordan, M. I. (2018). Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2724–2732).
Cao, Z., Ma, L., Long, M., & Wang, J. (2018). Partial adversarial domain adaptation. In Proceedings of the European conference on computer vision (pp. 139–155).
Cordts, M., Omran, M., Ramos, S., Scharwächter, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2015). The cityscapes dataset. In CVPR workshop on the future of datasets in vision.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248–255).
Duan, L., Xu, D., & Tsang, I. W. H. (2012). Domain adaptation from multiple sources: A domain-dependent regularization approach. IEEE Transactions on Neural Networks and Learning Systems, 23(3), 504–518.
Article Google Scholar
Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE international conference on computer vision (pp. 2960–2967).
Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180–1189).
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. (2017). Domain-adversarial training of neural networks. In Domain adaptation in computer vision applications (p. 189).
Gebru, T., Hoffman, J., & Fei-Fei, L. (2017). Fine-grained recognition in the wild: A multi-task domain adaptation approach. In Proceedings of the IEEE international conference on computer vision (pp. 1358–1367).
Ghifary, M., Kleijn, W. B., Zhang, M., Balduzzi, D., & Li, W. (2016). Deep reconstruction-classification networks for unsupervised domain adaptation. In Proceedings of the European conference on computer vision (pp. 597–613).
Gong, B., Grauman, K., & Sha, F. (2014). Learning kernels for unsupervised domain adaptation with applications to visual object recognition. International Journal of Computer Vision, 109(1–2), 3–27.
Article MathSciNet Google Scholar
Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2066–2073).
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In Proceedings of the IEEE international conference on computer vision (pp. 999–1006).
Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., & Smola, A. J. (2007). A kernel method for the two-sample-problem. In Advances in neural information processing systems (pp. 513–520).
Gretton, A., Smola, A. J., Huang, J., Schmittfull, M., Borgwardt, K. M., & Schölkopf, B. (2009). Covariate shift by kernel mean matching. Dataset Shift in Machine Learning, 3(4), 5.
Google Scholar
Haeusser, P., Frerix, T., Mordvintsev, A., & Cremers, D. (2017). Associative domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 2784–2792).
Ho, H. T., & Gopalan, R. (2014). Model-driven domain adaptation on product manifolds for unconstrained face recognition. International Journal of Computer Vision, 109(1–2), 110–125.
Article Google Scholar
Hoffman, J., Wang, D., Yu, F., & Darrell, T. (2016). Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649
Jhuo, I. H., Liu, D., Lee, D., & Chang, S. F. (2013a). Robust visual domain adaptation with low-rank reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2168–2175).
Jhuo, I. H., Liu, D., Lee, D. T., & Chang, S. F. (2013b). Robust visual domain adaptation with low-rank reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2168–2175).
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., & Girshick, R. B. (2017). CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1988–1997).
Kan, M., Wu, J., Shan, S., & Chen, X. (2014). Domain adaptation for face recognition: Targetize source domain bridged by common subspace. International Journal of Computer Vision, 109(1–2), 94–109.
Article Google Scholar
Kim, Y., Cho, D., & Hong, S. (2020). Towards privacy-preserving domain adaptation. IEEE Signal Processing Letters, 27, 1675–1679.
Article Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.
Koniusz, P., Tas, Y., & Porikli, F. (2017). Domain adaptation by mixture of alignments of second-or higher-order scatter tensors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7139–7148).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Liang, X., Xu, C., Shen, X., Yang, J., Tang, J., Lin, L., et al. (2016). Human parsing with contextualized convolutional neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 115–127.
Article Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).
Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).
Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2016). Unsupervised domain adaptation with residual transfer networks. In Advances in neural information processing systems (pp. 136–144).
Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2017). Deep transfer learning with joint adaptation networks. In Proceedings of the international conference on machine learning (pp. 2208–2217).
Lu, H., Zhang, L., Cao, Z., Wei, W., Xian, K., Shen, C., & van den Hengel, A. (2017). When unsupervised domain adaptation meets tensor representations. In Proceedings of the IEEE international conference on computer vision (pp. 599–608).
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
MATH Google Scholar
Mancini, M., Porzi, L., Bulò, S. R., Caputo, B., & Ricci, E. (2018). Boosting domain adaptation by discovering latent domains. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3771–3780).
Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation with multiple sources. In Advances in neural information processing systems (pp. 1041–1048).
Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Smolley, S. P. (2017). Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2794–2802).
Motiian, S., Jones, Q., Iranmanesh, S. M., & Doretto, G. (2017). Few-shot adversarial domain adaptation. In Advances in neural information processing systems (pp. 6670–6680).
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In Nips workshop on deep learning and unsupervised feature learning.
Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.
Article Google Scholar
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 1406–1415).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In Proceedings of the European conference on computer vision (pp. 213–226).
Saito, K., Ushiku, Y., & Harada, T. (2017). Asymmetric tri-training for unsupervised domain adaptation. In Proceedings of the international conference on machine learning (pp. 2988–2997).
Saito, K., Yamamoto, S., Ushiku, Y., & Harada, T. (2018). Open set domain adaptation by backpropagation. In Proceedings of the European conference on computer vision (pp. 156–171).
Shao, M., Kit, D., & Fu, Y. (2014). Generalized transfer subspace learning through low-rank constraint. International Journal of Computer Vision, 109(1–2), 74–93.
Article MathSciNet Google Scholar
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In AAAI conference on artificial intelligence (pp. 2058–2065).
Tzeng, E., Hoffman, J., Darrell, T., & Saenko, K. (2015). Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE international conference on computer vision (pp. 4068–4076).
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2962–2971).
Xie, J., Hu, W., Zhu, S. C., & Wu, Y. N. (2015). Learning sparse frame models for natural image patterns. International Journal of Computer Vision, 114(2–3), 91–112.
Article MathSciNet Google Scholar
Xu, J., Ramos, S., Vázquez, D., & López, A. M. (2016). Hierarchical adaptive structural SVM for domain adaptation. International Journal of Computer Vision, 119(2), 159–178.
Article MathSciNet Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).
Xu, R., Chen, Z., Zuo, W., Yan, J., & Lin, L. (2018). Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3964–3973).
Xu, R., Li, G., Yang, J., & Lin, L. (2019). Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 1426–1435).
Yan, H., Ding, Y., Li, P., Wang, Q., Xu, Y., & Zuo, W. (2017). Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 945–954).
Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In Proceedings of the ACM international conference on multimedia (pp. 188–197).
Yao, Y., Zhang, Y., Li, X., & Ye, Y. (2019). Heterogeneous domain adaptation via soft transfer network. In Proceedings of the 27th ACM international conference on multimedia (pp. 1578–1586).
You, K., Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2019). Universal domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2720–2729).
Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., & Samingerplatz, S. (2017). Central moment discrepancy (cmd) for domain-invariant representation learning. In International conference on learning representations.
Zhang, J., Li, W., & Ogunbona, P. (2017). Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5150–5158).
Zhang, S., Huang, J. B., Lim, J., Gong, Y., Wang, J., Ahuja, N., & Yang, M. H. (2019). Tracking persons-of-interest via unsupervised representation adaptation. International Journal of Computer Vision, 1–25.
Zhao, H., Zhang, S., Wu, G., Costeira, J. P., Moura, J. M. F., & Gordon, G. J. (2018). Multiple source domain adaptation with adversarial learning. In International conference on learning representations

Download references

Acknowledgements

This work was supported in part by NSFC (Nos. 62006253, U181146, 61836012, 61976233), State Key Development Program (No. 2018YFC0830103), Fundamental Research Funds for the Central Universities (No. 19lgpy228), and Major Project of Guangzhou Science and Technology of Collaborative Innovation and Industry under Grant 201605122151511. We also thank Ruijia Xu for his valuable suggestion to the revision.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Ziliang Chen, Pengxu Wei, Jingyu Zhuang, Guanbin Li & Liang Lin
Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
Ziliang Chen

Authors

Ziliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pengxu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jingyu Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Guanbin Li
View author publications
You can also search for this author in PubMed Google Scholar
Liang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pengxu Wei.

Additional information

Communicated by Minsu Cho.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs

Proof

(Proof (Lemma 1)) Suppose the optimal feature extractor is $F^*$, then M source-target adversarial learning pairs can be separately considered and correspond to the optimization objectives w.r.t. $\{D_j\}^M_{j=1}$, respectively. Substitute $F^*({\varvec{x}})$ by x, and there is

$$\begin{aligned} D^{*}_{j}(x) = \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{j}(x)+{\mathscr {P}}_{t}(x)} \end{aligned}$$

(22)

is exactly derived from Theorem.1 in Goodfellow et al. (2014).

$$\begin{aligned} \begin{aligned} h_t(x)&=\sum ^{M}_{j=1}\frac{-\log \left( \frac{{\mathscr {P}}_{t}(x)}{{\mathscr {P}}_{j}(x)+{\mathscr {P}}_{t}(x)}\right) h_j(x)}{\sum ^{M}_{k=1}-\log \left( \frac{{\mathscr {P}}_{t}(x)}{{\mathscr {P}}_{j}(x)+{\mathscr {P}}_{t}(x)}\right) }\\&=\sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) h_j(x)}{\sum ^{M}_{k=1}\log \left( 1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)}\right) }. \end{aligned} \end{aligned}$$

(23)

$\square $

Proof

(Proof (Proposition 2)) Given Lemma 1, it holds

$$\begin{aligned} \begin{aligned} h_t(x)&=\sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) }{\sum ^{M}_{k=1}\log \left( 1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)}\right) }h_j(x)\\&\le \sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) \ h_j(x)}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\le \sum ^{M}_{j=1}\frac{{\mathscr {P}}_{k}(x) \ h_j(x)}{{\mathscr {P}}_{t}(x)\sum ^{M}_{k=1}\log (1+ \alpha _k)}, \end{aligned} \end{aligned}$$

(24)

To this end, given a target feature x,

$$\begin{aligned}&{\mathscr {L}}({\mathscr {P}}_t,h_t,f)(x)\nonumber \\&\quad =L\left( \sum ^{M}_{j=1}\frac{\log (1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)})}{\sum ^{M}_{k=1}\log \left( 1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)}\right) }h_j(x),f(x)\right) {\mathscr {P}}_t(x)\nonumber \\&\quad \le \sum ^{M}_{j=1}\frac{\log \left( 1+ \frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)}\right) }{\sum ^{M}_{k=1}\log (1+ \frac{{\mathscr {P}}_{k}(x)}{{\mathscr {P}}_{t}(x)})} {\mathscr {P}}_t(x)L\left( h_j(x),f(x)\right) \nonumber \\&\quad \le \sum ^{M}_{j=1}\frac{{\mathscr {P}}_{j}(x)}{{\mathscr {P}}_{t}(x)\sum ^{M}_{k=1}\log (1+ \alpha _k)} {\mathscr {P}}_t(x)L\left( h_j(x),f(x)\right) \nonumber \\&\quad = \sum ^{M}_{j=1}\frac{{\mathscr {P}}_{j}(x)}{\sum ^{M}_{k=1}\log (1+ \alpha _k)} L\left( h_j(x),f(x)\right) \end{aligned}$$

(25)

in which the first inequality is derived from the convexity of the loss function $L(\cdot ,\cdot )$. $\square $

Proof

(Proposition 3) In terms of Proposition 1, we provide the upper bound of ${\mathscr {L}}({\mathscr {P}}_t,h_t,f)$. Specifically,

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&=\int _{x\in {\mathscr {X}}}{\mathscr {L}}({\mathscr {P}}_t,h_t,f)(x)dx\\&\le \int _{x\in {\mathscr {X}}}\sum ^{M}_{j=1}\frac{{\mathscr {P}}_{j}(x)}{\sum ^{M}_{k=1}\log (1+ \alpha _k)} L\big (h_j(x),f(x)\big )dx\\&=\frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\sum ^{M}_{j=1}\\&\quad \int _{x\in {\mathscr {X}}}{\mathscr {P}}_{j}(x) L\big (h_j(x),f(x)\big )dx \end{aligned} \end{aligned}$$

(26)

Since $\rho $ indicates the proportion of wrongly-labeled target data by the auto-annotating strategy in the discriminative adaptation phase; $f'(x)$ represents the wrong target function w.r.t. x, namely, $\forall x\in {\mathscr {X}}$, it holds $f'(x)\ne f(x)$ and $L\big (h_j(x),f(x)\big )\le L\big (h_j(x),f'(x)\big )$. Therefore,

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}\int _{x\in {\mathscr {X}}} L\big (h_j(x),f(x)\big )dx\\&= \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}\Bigg (\int _{x\in {\mathscr {X}}}{\mathscr {P}}_{j}(x)\bigg ((1-\rho ) L\big (h_j(x),f(x)\big )\\&\quad +\rho L\big (h_j(x),f(x)\big )\bigg )dx\Bigg ),\\&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}\Bigg (\int _{x\in {\mathscr {X}}}{\mathscr {P}}_{j}(x)\bigg ((1-\rho ) L\big (h_j(x),f(x)\big )\\&\quad +\rho L\big (h_j(x),f'(x)\big )\bigg )dx\Bigg ),\\ \end{aligned} \end{aligned}$$

(27)

Due to the assumption of the 0-1 loss function on L, it follows the analysis in Saito et al. (2017) and holds

$$\begin{aligned} \begin{aligned}&\int _{x\in {\mathscr {X}}} {\mathscr {P}}_j(x)L\big (h_j(x),f(x)\big )dx\\&\quad \le \int _{x\in {\mathscr {X}}} {\mathscr {P}}_j(x)L\big (h_j(x),f'(x)\big )dx\\&\quad \le \int _{x\in {\mathscr {X}}} {\mathscr {P}}_j(x)dx = 1. \end{aligned} \end{aligned}$$

(28)

so that

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\big ((1-\rho )\\&\quad {\sum _{j\in [M]}\epsilon _j}+M\rho \big ) \end{aligned} \end{aligned}$$

(29)

Conclude the proof. $\square $

Proof

(Proposition 4) According to the results of Proposition.21, we have

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)(x)&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \sum ^{M}_{j=1}{\mathscr {P}}_{j}(x) L\big (h_j(x),f(x)\big ). \end{aligned} \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&=\frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\\&\quad \int _{x\in {\mathscr {X}}}\sum ^{M}_{j=1}{\mathscr {P}}_{j}(x) L\big (h_j(x),f(x)\big )(x)dx\\&\le \frac{1}{\sum ^{M}_{k=1} \log (1+ \alpha _k)}\\&\quad \sum _{j\in [M]}\Bigg ((1-\rho ')\underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_j(x)L\big (h_j(x),f(x)\big )dx\\&\quad \underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_j(x)L\big (h_j(x),f'(x)\big )dx\Bigg )\\&\le \frac{1}{\sum ^{M}_{k=1} \log (1+ \alpha _k)}\\&\quad \sum _{j\in [M]} \Bigg ( (1-\rho ')\bigg ( (1-\rho )\underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f(x)\big )dx\\&\quad +\rho \underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f'(x)\big )dx \bigg )\\&\quad +\rho '\underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f'(x)\big )dx\Bigg ), \end{aligned} \end{aligned}$$

(30)

where the first, third and fifth inequalities are derived from the proof of Proposition 2; the second inequality is developed from the entropy-based unknown category discovery strategy, which is specified in target-category-shift and source-target-category-shift scenarios (Notice that, since the unknown class discovery is executed ahead of the pseudo-labeling strategy, it makes the the inequality w.r.t. $\rho $ nested in the inequality w.r.t. $\rho '$); the fourth inequality is derived from the 0–1 loss function upper bound discussed in Saito et al. (2017) .

Due to the assumption of the 0–1 loss function on L, it follows (Saito et al. 2017) and holds

$$\begin{aligned} \begin{aligned} {\mathscr {L}}({\mathscr {P}}_t,h_t,f)&\le \sum _{j\in [M]}\frac{1}{\sum ^{M}_{k=1} \log (1+ \alpha _k)}\Big ((1-\rho ')(1-\rho )\\&\quad \underset{x\in {\mathscr {X}}}{\int } {\mathscr {P}}_t(x)L\big (h_j(x),f(x)\big )dx+\rho (1-\rho ')+\rho '\Big )\\&\le \frac{1}{\sum ^{M}_{k=1}\log (1+ \alpha _k)}\Big ((1\,\, -\,\, \rho ')(1\,\, -\,\, \rho ){\sum _{j\in [M]}\epsilon _j}\,\,\\&\quad +\big (\,\, (1\,\, -\,\, \rho ')\rho +\rho '\big )M\Big ) \end{aligned} \end{aligned}$$

(31)

$\square $

Appendix B: Implementation details

The setup of $\gamma $ and $\zeta $ The the pseudo labeling strategies of DCTN rely on hyper-parameters $\gamma $ and $\zeta $. The threshold $\gamma $ is leveraged to select a part of target candidates, which are annotated as “high-confident” and augmented with multi-source examples to train the multi-source classifiers. We set the value over 90% to ensure the quality of selecte target samples. Instead of choosing a specific threshold $\gamma $, we rank the target examples according to their entropy values on the source-specific classifiers by a monotonically decreasing order, then choose the top 15% as the “unknown” candidates: 300/120/140 as the “unknown” candidates in A/D/W domains, respectively. This manner promises adequate “unknown” examples to train a reliable classifier for each source domain. The schemes is adopted the same in the experiment of DomainNet (Details see Table 7).

Network Implementation details For the recognitions in Office-31 and ImageCLEF-DA, existing deep DA approaches (Long et al. 2015, 2016) routinely employ Alexnet (Krizhevsky et al. 2012) as their backbones. For a fair comparison, we choose a DCTN architecture deriving from the Alexnet pipeline. As Fig. 11 illustrated, the representation module F is designed as a five-layer fully-convolutional network with three max-pooling operators, and the (multi-source) category classifier C is a three-layer fully-connected multi-task network. They are stacked into an exactly Alexnet-like pipeline to categorize examples. We adopt a CNN with a two-head classifier as domain discriminator D.

For the sake of legibility, we apply the sigmoid cross entropy loss to denote the multi-way adversarial learning inducing the perplexity score in our paper. Under M adversarial adaptation context, this loss function leads to the gradient vanishing and behaves extremely unstable during training. To overcome this issue, we replace it with the least square measure (Mao et al. 2017) in practice to ensure robust adversarial learning:

$$\begin{aligned} \begin{aligned} {\mathscr {L}}^{(ls)}_{adv}(F, D)&= \frac{1}{M}\sum _{j}^{M}{\mathbb {E}}_{{\varvec{x}}\sim {\mathbf {X}}_{j}}[(D_{j}(F({\varvec{x}})))^2] \\&\quad + {\mathbb {E}}_{{\varvec{x}}^{(t)}\sim {\mathbf {X}}_{t}}[(1- D_{j}(F({\varvec{x}}^{(t)})))^2].\\ \end{aligned} \end{aligned}$$

(32)

Accordingly, the confusion loss has been revised as

$$\begin{aligned} \ \begin{aligned} {\mathscr {L}}^{(ls)}_{cf}({\varvec{x}};F,D_{j}) = \left( D_{j}(F({\varvec{x}}))-\frac{1}{2}\right) ^2. \end{aligned} \end{aligned}$$

(33)

Then given a target instance ${\varvec{x}}^{(t)}$, a least square perplexity score is

$$\begin{aligned} \begin{aligned} s({\varvec{x}}^{(t)};F,D_{j}) = (D_{j}(F({\varvec{x}}^{(t)})))^2. \end{aligned} \end{aligned}$$

(34)

The implementation keeps consistent with all our analysis in the paper. No matter in training or test, we need a perplexity score weighting scheme to predict the class of the target instance. While in the adversarial learning process, the domain discriminator D must be gradually trained to accommodate the learning of feature extractor F. It means that in those previous epoches, the perplexity scores are not capable of providing reliable probablistic relations between target and each source. This hurts the pseudo-labeling scheme and further spoils the adversary at the next alternative step. Empirically, this negative effect mostly attributes to the unstable predictions to target instances. Hence we utilize the moving average to calculate the perpelxity score for each target instance.

$$\begin{aligned} \begin{aligned} s({\varvec{x}}_{N_T}^{(t)};F,D_{j}) = \frac{1}{N_{T}}\sum _{i}^{N_T}(D_{j}(F({\varvec{x}}_i^{(t)})))^2, \end{aligned} \end{aligned}$$

(35)

where $N_T$ denotes the number of times that the target samples have been visited to train our model (one target mini-batch as the measurement unit); $x_{N_T}^t$ denotes the current target instance being considered.

Hyper-parameter setting of training In visual object recognition experiments (Office-31 and ImageCLEF), we initiate our DCTN by following the same way of DAN (Long et al. 2015). In terms of digit recognition, DCTN learns from scratch. In order to execute online hard domain mining, we construct our mini-batch by sampling an equal number of images per domain. For instance, consider a case of two-source domain adaptation with a domain batch size of 32. Then we have mini-batches with the sizes as $96 = 32 \times (2+1)$ (2 and 1 denote two source domains and one target domain). In this situation, the length of one epoch is decided by the size of the domain containing most instances. Finally, we adopt Adam (Kingma and Ba 2015) solver with $momentum = (0.9, 0.99) $ in all experiments to update our networks (Fig. 12).

Table 7 The hyper-parameters setting in our experiment

Full size table

More hyper-parameter details are shown in Table 7.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Wei, P., Zhuang, J. et al. Deep CockTail Networks. Int J Comput Vis 129, 2328–2351 (2021). https://doi.org/10.1007/s11263-021-01463-x

Download citation

Received: 30 January 2020
Accepted: 23 March 2021
Published: 20 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11263-021-01463-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep CockTail Networks

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Learning to Prompt for Vision-Language Models

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Notes

References

Acknowledgements