Skip to main content
Log in

Stacked multichannel autoencoder – an efficient way of learning from synthetic data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Learning from synthetic data has many important applications in case where sufficient amounts of labeled data are not available. Using synthetic data is challenging due to differences in feature distributions between synthetic and actual data, a phenomenon we term synthetic gap. In this paper, we investigate and formalize a general framework – Stacked Multichannel Autoencoder (SMCAE) that enables bridging the synthetic gap and learning from synthetic data more efficiently. In particular, we show that our SMCAE can not only transform and use synthetic data on a challenging face-sketch recognition task, but that it can also help simulate real images which can be used for training classifiers for recognition. Preliminary experiments validate the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. δ is a sparsity parameter and is empirically set to 0.05 in all our experiments.

  2. Collected from UCI machine learning repository (HWDUCI) [3].

  3. The parameters are cross-validated

References

  1. Alimoglu F, Alpaydin E (1997) Combining multiple representations and classifiers for handwritten digit recognition. In: ICDAR

  2. Alnajar F, Lou Z, Alvarez J, Gevers T (2014) Expression-invariant age estimation. In: BMVC

  3. Bache K, Lichman M (2013) UCI machine learning repository. [Online]. Available: http://archive.ics.uci.edu/ml

  4. Bal G, Agam G, Frieder O, Frieder G (2008) Interactive degraded document enhancement and ground truth generation. In: Electronic imaging 2008 international society for optics and photonics

  5. Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. Unsupervised Transfer Learn Challenges Mach Learn 7:43

    Google Scholar 

  6. Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan J W (2010) A theory of learning from different domains. Mach Learn

  7. Bengio Y (2009) Learning deep architectures for ai. Foundations and trends®;, in Machine Learning 2(1):1–127

    Article  MathSciNet  Google Scholar 

  8. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. Unsupervised Transfer Learn Challenges Mach Learn 7:19

    Google Scholar 

  9. Chen M, Xu Z, Weinberger K Q, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: International conference on machine learning

  10. Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Affective Computing and intelligent interaction (ACII)

  11. Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICML

  12. Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: ECCV

  13. Klare B F, Li Z, Jain A K (2011) Matching forensic sketches to mug shot photos. IEEE Trans Pattern Anal Mach Intell 33(3):639–646

    Article  Google Scholar 

  14. Lampert C H, Nickisch H, Harmeling S (2013) Attribute-based classification for zero-shot visual object categorization. In: IEEE TPAMI

  15. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  16. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Article  Google Scholar 

  17. Pan S J, Yang Q (2010) A survey on transfer learning. In: IEEE TKDE

  18. Pishchulin L, Jain A, Wojek C, Andriluka M, Thormählen T, Schiele B (2011) Learning people detection models from few training samples. In: 2011 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 1473–1480

  19. Ruiz A, Van de Weijer J, Binefa X (2014) Regularized multi-concept mil for weakly-supervised facial behavior categorization. In: BMVC

  20. Sarinnapakorn K, Kubat M (2007) Combining subclassifiers in text categorization. A dst-based solution and a case study. In: IEEE TKDE

  21. Srivastava N, Salakhutdinov R R (2012) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, pp 2222–2230

  22. Sun B, Saenko K (2014) From virtual to reality: fast adaptation of virtual object detectors to real domains. In: Proceedings of the British machine vision conference. BMVA Press

  23. Turk M A, Pentland A P (1991) Face recognition using eigenfaces. In: IEEE Computer Society conference on computer vision and pattern recognition. IEEE, pp 586–591

  24. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(2579–2605):85

    MATH  Google Scholar 

  25. Varga T (2004) Comparing natural and synthetic training data for off-line cursive handwriting recognition. In: IWFHR-9 Ninth International workshop on frontiers in handwriting recognition, 2004. IEEE, pp 221–225

  26. Varga T, Bunke H (2003) Effects of training set expansion in handwriting recognition using synthetic data. In: 11th Conf. of the international graphonomics society. Citeseer

  27. Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103

  28. Vincent P, Larochelle H, Bengio Y, Manzagol P -A (2011) Extracting and composing robust features with denoising autoencoders. In: ICML

  29. Wang X, Tang X (2009) Face photo-sketch synthesis and recognition. In: IEEE TPAMI

  30. Wang W, Cui Z, Chang H, Shan S, Chen X (2014) Deeply coupled auto-encoder networks for cross-view classification. arXiv:1402.2031

  31. Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: ICML

  32. Zhang W, Wang X, Tang X (2011) Coupled information-theoretic encoding for face photo-sketch recognition. In: CVPR. IEEE, pp 513–520

  33. Zhang X, Agam G, Chen X (2014) Alignment of 3d building models with satellite images using extended chamfer matching. In: The IEEE Conference on computer vision and pattern recognition (CVPR) workshops

  34. Zhou Q -Y, Neumann U (2008) Fast and extensible building modeling from airborne lidar data. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, p 7

  35. Zhu F, Shao L, Tang J (2014) Boosted cross-domain categorization. In: British machine vision conference

Download references

Acknowledgements

This work is supported by Fudan University-CIOMP Joint Fund (FC2017-006). Yanwei Fu is supported by The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. TP2017006). Yanwei Fu is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanwei Fu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Fu, Y., Jiang, S. et al. Stacked multichannel autoencoder – an efficient way of learning from synthetic data. Multimed Tools Appl 77, 26563–26580 (2018). https://doi.org/10.1007/s11042-018-5879-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5879-7

Keywords

Navigation