Skip to main content
Log in

Gradual recovery based occluded digit images recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recent research shows that auto-encoder is suitable to model a variation which varies smoothly. In this paper, we attempt to utilize auto-encoder to recognize partially occluded digit images with gradual recovery. We propose a new variation of auto-encoder, namely the “generalized auto-encoder”, and construct stacked generalized auto-encoders (SGAE) for the problem of occluded digit images recovery and recognition. Rather than recovering the occlusion directly, the degree of occlusion is regarded as a continuous variable, and the recovery task is regarded as a gradual process. We divide the whole task into multiple intermediate recovery procedures, and assign each procedure to one generalized auto-encoder, thus handling the recovery problem gradually. Based on the encouraging recovery results, the occluded digit images can be recognized well. The results demonstrate that gradual recovery outperforms direct recovery of the occluded region. Moreover, the main application in this paper is occluded digit images recognition, though, the proposed framework can be generalized to other problems easily and nicely. Extensive experiments are designed to verify our settings and show the effectiveness, extendibility and generalizability of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://yann.lecun.com/exdb/mnist/

  2. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/

  3. http://www.pitt.edu/∼emotion/ck-spread.htm

  4. http://www.vlfeat.org/matconvnet/

References

  1. Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) Pixelnet: representation of the pixels, by the pixels, and for the pixels. arXiv:1702.06506

  2. Benenson R (2014) Occlusion handling. Springer, New York

    Book  Google Scholar 

  3. Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3642–3649

  4. Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002

    Article  Google Scholar 

  5. de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: International conference on computer vision theory and applications

  6. Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38:295–307

    Article  Google Scholar 

  7. Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Washington

    Google Scholar 

  8. Fan N (2010) Feature-based partially occluded object recognition. In: International conference on pattern recognition, pp 3001–3004

  9. Filho ANGL, Mello CAB (2012) A novel method for reconstructing degraded digits. In: IEEE international conference on systems, man, and cybernetics, pp 733–738

  10. Ghifary M, Kleijn W, Zhang M (2014) Deep hybrid networks with good out-of-sample object recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 5437–5441

  11. Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Article  MathSciNet  Google Scholar 

  12. Hu Z, Song Y (2009) Dimensionality reduction and reconstruction of data based on autoencoder network. J Electron Inf Technol 31:1189–1192

    Google Scholar 

  13. Isola P, Zhu JY, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks. arXiv:1611.07004

  14. Kan M, Shan S, Chang H, Chen X (2014) Stacked progressive auto-encoders (spae) for face recognition across poses. In: IEEE conference on computer vision and pattern recognition, pp 1883–1890

  15. Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: IEEE international conference on automatic face and gesture recognition, pp 46–53

  16. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114

  17. Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37:233–243

    Article  Google Scholar 

  18. Krolupper F, Flusser J (2007) Polygonal shape description for recognition of partially occluded objects. Pattern Recogn Lett 28:1002–1011

    Article  Google Scholar 

  19. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324

    Article  Google Scholar 

  20. Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. Eprint Arxiv, pp 562– 570

  21. Li C, Zhu J, Zhang B (2016) Learning to generate with memory. International Conference on Machine Learning 48:1177–1186

    Google Scholar 

  22. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. International Joint Conference on Artificial Intelligence 2015:1617–1623

    Google Scholar 

  23. Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI conference on artificial intelligence

  24. Liu L, Xiong C, Zhang H, Niu Z (2016) Deep aging face verification with large gaps. IEEE Trans Multimedia 18:64–75

    Article  Google Scholar 

  25. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115

    Article  Google Scholar 

  26. Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. AAAI Conference on Artificial Intelligence 2016:201–207

    Google Scholar 

  27. Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: International joint conference on artificial intelligence, pp 2576–2582

  28. Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools and Applications 76:10,701–10,719

    Article  Google Scholar 

  29. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE conference on computer vision and pattern recognition - workshops, pp 94–101

  30. Makhzani A, Frey B (2013) K-sparse autoencoders. arXiv:1312.5663

  31. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: International conference on machine learning, pp 833–840

  32. Saber E, Xu Y, Tekalp AM (2005) Partial shape recognition by sub-matrix matching for partial matching guided image labeling. Pattern Recogn 38:1560–1573

    Article  Google Scholar 

  33. Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3642–3649

  34. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International conference on machine learning, pp 1096–1103

  35. Wang P, Yuille AL (2015) Doc: deep occlusion recovering from a single image. CoRR. arXiv:1511.06457

  36. Wang S, Shao M, Fu Y (2014) Attractive or not?: beauty prediction with attractiveness-aware encoders and robust late fusion. In: ACM international conference on multimedia, pp 805–808

  37. Wang Y, Yao H, Zhao S (2015) Auto-encoder based dimensionality reduction. Neurocomputing 184 :232–242

    Article  Google Scholar 

  38. Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 2411–2418

  39. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition, pp 532–539

  40. Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: IEEE international conference on computer vision, pp 4633–4641

  41. Zhao F, Feng J, Zhao J, Yang W, Yan S (2016) Robust lstm-autoencoders for face de-occlusion in the wild. arXiv:1612.08534

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxun Yao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Yao, H., Yu, W. et al. Gradual recovery based occluded digit images recognition. Multimed Tools Appl 78, 2571–2586 (2019). https://doi.org/10.1007/s11042-018-6048-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6048-8

Keywords

Navigation