Learning image by-parts using early and late fusion of auto-encoder features

Susan, Seba; Malhotra, Jatin

doi:10.1007/s11042-021-11092-8

Learning image by-parts using early and late fusion of auto-encoder features

Published: 03 July 2021

Volume 80, pages 29601–29615, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

309 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

A novel sub-part learning scheme is introduced in our work for the purpose of recognizing handwritten numeral images. The idea is borrowed from the concept of visual perception and part-wise integration of visual information by the cortical regions of the brain. In this context, each numeral image is divided into four half-parts: top-half, bottom-half, left-half and right-half; the other half of the image being kept masked. An efficient data representation is derived in an unsupervised manner, from each image part, using convolutional auto-encoders (CAE), for our learning scheme that involves both early and late fusion of features. The chief advantage of the features derived from convolutional auto-encoders is the preservation of 2D spatial locality while the features are being filtered layer-by-layer through the convolutional architecture. The features derived from each individual CAE are fused by concatenation in our early fusion scheme, and learnt using an appropriate classifier. The late fusion strategy involves learning the probability density pertaining to the predicted values emanating from the four base classifiers using a meta-learner classifier. The early-cum-late fusion is proposed in the later stage of our work to combine the goodness of both schemes and enhance the performance. The support vector machine is used in all the classification stages. Experiments on the benchmark MNIST dataset of handwritten English numerals prove that our method competes favorably to the state of the art, as inferred from the high classification scores achieved. Our method thus provides a computationally simple and effective methodology for sub-part learning and part-wise integration of information from different parts of the image. The method also contributes to saving in computational expense since, at a time, only a small part of the image is processed, speeding up the inferencing process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Article 12 June 2020

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

References

Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning, pp. 17–36
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In Advances in neural information processing systems, pp. 153–160
Cheng K, Tahir R, Eric LK, Li M (2020) An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimed Tools Appl:1–28
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Diehl PU, Cook M (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front Comput Neurosci 9:99
Article Google Scholar
Ebrahimzadeh R, Jampour M (2014) Efficient handwritten digit recognition based on histogram of oriented gradients and SVM. International Journal of Computer Applications 104(9):10–13
Article Google Scholar
Gao X, Zhou C, Chao F, Yang L, Lin C-M, Xu T, Shang C, Shen Q (2019) A data-driven robotic Chinese calligraphy system using convolutional auto-encoder and differential evolution. Knowl-Based Syst 182:104802
Article Google Scholar
Geng Q, Lu F, Huang X, Wang S, Cheng X, Zhou Z, Yang R (2018) Part-level car parsing and reconstruction from single street view. arXiv preprint arXiv:1811.10837
Hassan T, Khan HA (2015) Handwritten bangla numeral recognition using local binary pattern. In 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), pp. 1–4. IEEE, 2015.
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Hosmer Jr DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression 398. John Wiley & Sons
Hou B, Yan R (2019) Convolutional auto-encoder model for finger-vein verification. IEEE Trans Instrum Meas
https://github.com/JMalhotra7/Learning-image-by-parts-using-early-and-late-fusion-of-auto-encoder-features [Last accessed on 27th Dec 2020]
Izonin I, Tkachenko R, Kryvinska N, Tkachenko P (2019) Multiple linear regression based on coefficients identification using non-iterative SGTM Neural-Like Structure. In International Work-Conference on Artificial Neural Networks, pp. 467–479. Springer, Cham
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53(8):5455–5516
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105
Kuo C-CJ (2016) Understanding convolutional neural networks with a mathematical model. J Vis Commun Image Represent 41:406–413
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Article Google Scholar
Liu X, Wang X, Matwin S (2018) Interpretable deep convolutional neural networks via meta-learning. In 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE
Loey M, El-Sawy A, EL-Bakry H (2017) Deep learning autoencoder approach for handwritten arabic digits recognition. arXiv preprint arXiv:1706.06720.
Lorenz D, Bereska L, Milbich T, Ommer B (2019) Unsupervised part-based disentangling of object shape and appearance. arXiv preprint arXiv:1903.06946
Malinowski M, Doersch C (2018) The visual QA devil in the details: The impact of early fusion and batch norm on clevr. arXiv preprint arXiv:1809.04482
Malowany D, Guterman H (2020) Biologically inspired visual system architecture for object recognition in autonomous systems. arXiv preprint arXiv:2002.03472
Masci J, Meier U, Cireşan D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks, pp. 52–59. Springer, Berlin, Heidelberg
McDonnell MD, Tissera MD, Vladusich T, van Schaik A, Tapson J (2015) Fast, simple and accurate handwritten digit classification by training shallow neural network classifiers with the ‘extreme learning machine’algorithm. PLoS One 10(8):e0134254
Article Google Scholar
Palvanov A, Cho YI (2018) Comparisons of Deep Learning Algorithms for MNIST in Real-Time Environment. Int J Fuzzy Log Intell 18(2):126–134
Park J, Lee G, Kim E, Lim J, Kim S, Yang H, Lee M, Hwang S (2010) Automatic detection and recognition of Korean text in outdoor signboard images. Pattern Recogn Lett 31(12):1728–1739
Article Google Scholar
Safdari R, Moin M-S (2016) A hierarchical feature learning for isolated Farsi handwritten digit recognition using sparse autoencoder. In 2016 Artificial Intelligence and Robotics (IRANOPEN), pp. 67–71. IEEE, 2016.
Schott L, Rauber J, Bethge M, Brendel W (2018) Towards the first adversarially robust neural network model on MNIST. arXiv preprint arXiv:1805.09190
Shi M, Fujisawa Y, Wakabayashi T, Kimura F (2002) Handwritten numeral recognition using gradient and curvature of gray scale image. Pattern Recogn 35(10):2051–2059
Article Google Scholar
Snoek, Cees GM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In Proceedings of the 13th annual ACM international conference on Multimedia, pp. 399–402. ACM
Špaňhel, Jakub, Jakub Sochor, Roman Juránek, Adam Herout, Lukáš Maršík, and Pavel Zemčík (2017) Holistic recognition of low quality license plates by cnn using track annotated data. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE, 2017.
Spratling MW (2017) A hierarchical predictive coding model of object recognition in natural images. Cognitive computation 9(2):151–167
Article Google Scholar
Srivastava, Rupesh K, Greff K, Schmidhuber J (2015) Training very deep networks. In Advances in neural information processing systems, pp. 2377–2385
Sung J, Bang S-Y, Choi S (2006) A Bayesian network classifier and hierarchical Gabor features for handwritten numeral recognition. Pattern Recogn Lett 27(1):66–75
Article Google Scholar
Susan S, Devi KMR (2019) Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues. Pattern Analysis and Applications:1–13
Susan S, Kadyan P (2013) A supervised fuzzy eye pair detection algorithm. In 2013 5th International Conference and Computational Intelligence and Communication Networks, pp. 306–310. IEEE
Susan S, Kakkar G (2015) Decoding facial expressions using a new normalized similarity index. In 2015 Annual IEEE India Conference (INDICON), pp. 1–6. IEEE
Susan S, Keshari J (2019) Finding significant keywords for document databases by two-phase maximum entropy partitioning. Pattern Recogn Lett 125:195–205
Article Google Scholar
Susan S, Malhotra J (2019) CNN Pre-initialization by minimalistic part-learning for handwritten numeral recognition. International Conference on Mining Intelligence and Knowledge Exploration:320–329. Springer, Cham
Susan S, Malhotra J (2020) Learning interpretable hidden state structures for handwritten numeral recognition. In 2020 4th International Conference on Computational Intelligence and Networks (CINE), pp. 1–6. IEEE
Susan S, Malhotra J (2020) Recognising devanagari script by deep structure learning of image quadrants. DESIDOC J Libr Inf Technol 40(5):268–271
Article Google Scholar
Susan S, Singh V (2011) On the discriminative power of different feature subsets for handwritten numeral recognition using the box-partitioning method. In 2011 Annual IEEE India Conference, pp. 1–5. IEEE
Susan S, Ranjan R, Taluja U, Rai S, Agarwal P (2019) Neural net optimization by weight-entropy monitoring. In Computational intelligence: theories, applications and future directions-volume II, pp. 201–213. Springer, Singapore
Tkachenko R, Izonin I (2018) Model and principles for the implementation of neural-like structures based on geometric data transformations. In International Conference on Computer Science, Engineering and Education Applications, pp. 578–587. Springer, Cham
Tkachenko R, Tkachenko P, Izonin I, Tsymbal Y (2018) Learning-based image scaling using neural-like structure of geometric transformation paradigm. In Advances in Soft Computing and Machine Learning in Image Processing, pp. 537–565. Springer, Cham
Wang M, Chen Y, Wang X (2014) Recognition of handwritten characters in chinese legal amounts by stacked autoencoders. In 2014 22nd International Conference on Pattern Recognition, pp. 3002–3007. IEEE
Wang Y, Xie Z, Xu K, Dou Y, Lei Y (2016) An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning. Neurocomputing 174:988–998
Article Google Scholar
Wang Y, Li F, Sun H, Li W, Cheng Z, Wu X, Wang H, Wang P (2020) Improvement of MNIST Image Recognition Based on CNN. In IOP Conference Series: Earth and Environmental Science 428(1):012097. IOP Publishing
Google Scholar
Xie L, Wang J, Wei Z, Wang M, Tian Q (2016) Disturblabel: regularizing cnn on the loss layer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4753–4762
Xu X (2013) Receipt digitizing method for retail customers. U.S. Patent Application 13/507,291, filed March 7, 2013.
Yang S, Luo P, Loy CC, Shum KW, Tang X (2015) Deep representation learning with target coding. In Twenty-Ninth AAAI Conference on Artificial Intelligence
Yang Z-X, Tang L, Zhang K, Wong PK (2018) Multi-view cnn feature aggregation with elm auto-encoder for 3d shape recognition. Cogn Comput 10(6):908–921
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Delhi Technological University, Delhi, 110042, India
Seba Susan & Jatin Malhotra

Authors

Seba Susan
View author publications
You can also search for this author in PubMed Google Scholar
Jatin Malhotra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seba Susan.

Ethics declarations

Conflict of interest

The two authors are with the Department of Information Technology, Delhi Technological University, New Delhi, India.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Susan, S., Malhotra, J. Learning image by-parts using early and late fusion of auto-encoder features. Multimed Tools Appl 80, 29601–29615 (2021). https://doi.org/10.1007/s11042-021-11092-8

Download citation

Received: 12 September 2020
Revised: 07 January 2021
Accepted: 21 May 2021
Published: 03 July 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11042-021-11092-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning image by-parts using early and late fusion of auto-encoder features

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Image Matching from Handcrafted to Deep Features: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning image by-parts using early and late fusion of auto-encoder features

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Image Matching from Handcrafted to Deep Features: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation