Abstract
Recent research shows that auto-encoder is suitable to model a variation which varies smoothly. In this paper, we attempt to utilize auto-encoder to recognize partially occluded digit images with gradual recovery. We propose a new variation of auto-encoder, namely the “generalized auto-encoder”, and construct stacked generalized auto-encoders (SGAE) for the problem of occluded digit images recovery and recognition. Rather than recovering the occlusion directly, the degree of occlusion is regarded as a continuous variable, and the recovery task is regarded as a gradual process. We divide the whole task into multiple intermediate recovery procedures, and assign each procedure to one generalized auto-encoder, thus handling the recovery problem gradually. Based on the encouraging recovery results, the occluded digit images can be recognized well. The results demonstrate that gradual recovery outperforms direct recovery of the occluded region. Moreover, the main application in this paper is occluded digit images recognition, though, the proposed framework can be generalized to other problems easily and nicely. Extensive experiments are designed to verify our settings and show the effectiveness, extendibility and generalizability of the method.









Similar content being viewed by others
References
Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) Pixelnet: representation of the pixels, by the pixels, and for the pixels. arXiv:1702.06506
Benenson R (2014) Occlusion handling. Springer, New York
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3642–3649
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002
de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: International conference on computer vision theory and applications
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38:295–307
Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Washington
Fan N (2010) Feature-based partially occluded object recognition. In: International conference on pattern recognition, pp 3001–3004
Filho ANGL, Mello CAB (2012) A novel method for reconstructing degraded digits. In: IEEE international conference on systems, man, and cybernetics, pp 733–738
Ghifary M, Kleijn W, Zhang M (2014) Deep hybrid networks with good out-of-sample object recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 5437–5441
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Hu Z, Song Y (2009) Dimensionality reduction and reconstruction of data based on autoencoder network. J Electron Inf Technol 31:1189–1192
Isola P, Zhu JY, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks. arXiv:1611.07004
Kan M, Shan S, Chang H, Chen X (2014) Stacked progressive auto-encoders (spae) for face recognition across poses. In: IEEE conference on computer vision and pattern recognition, pp 1883–1890
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: IEEE international conference on automatic face and gesture recognition, pp 46–53
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37:233–243
Krolupper F, Flusser J (2007) Polygonal shape description for recognition of partially occluded objects. Pattern Recogn Lett 28:1002–1011
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324
Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. Eprint Arxiv, pp 562– 570
Li C, Zhu J, Zhang B (2016) Learning to generate with memory. International Conference on Machine Learning 48:1177–1186
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. International Joint Conference on Artificial Intelligence 2015:1617–1623
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI conference on artificial intelligence
Liu L, Xiong C, Zhang H, Niu Z (2016) Deep aging face verification with large gaps. IEEE Trans Multimedia 18:64–75
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. AAAI Conference on Artificial Intelligence 2016:201–207
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: International joint conference on artificial intelligence, pp 2576–2582
Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools and Applications 76:10,701–10,719
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE conference on computer vision and pattern recognition - workshops, pp 94–101
Makhzani A, Frey B (2013) K-sparse autoencoders. arXiv:1312.5663
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: International conference on machine learning, pp 833–840
Saber E, Xu Y, Tekalp AM (2005) Partial shape recognition by sub-matrix matching for partial matching guided image labeling. Pattern Recogn 38:1560–1573
Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: IEEE conference on computer vision and pattern recognition, pp 3642–3649
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International conference on machine learning, pp 1096–1103
Wang P, Yuille AL (2015) Doc: deep occlusion recovering from a single image. CoRR. arXiv:1511.06457
Wang S, Shao M, Fu Y (2014) Attractive or not?: beauty prediction with attractiveness-aware encoders and robust late fusion. In: ACM international conference on multimedia, pp 805–808
Wang Y, Yao H, Zhao S (2015) Auto-encoder based dimensionality reduction. Neurocomputing 184 :232–242
Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 2411–2418
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition, pp 532–539
Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: IEEE international conference on computer vision, pp 4633–4641
Zhao F, Feng J, Zhao J, Yang W, Yan S (2016) Robust lstm-autoencoders for face de-occlusion in the wild. arXiv:1612.08534
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Yao, H., Yu, W. et al. Gradual recovery based occluded digit images recognition. Multimed Tools Appl 78, 2571–2586 (2019). https://doi.org/10.1007/s11042-018-6048-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6048-8