Skip to main content
Log in

Data augmentation for handwritten digit recognition using generative adversarial networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Supervised learning techniques require labeled examples that can be time consuming to obtain. In particular, deep learning approaches, where all the feature extraction stages are learned within the artificial neural network, require a large number of labeled examples to train the model. Various data augmentation techniques can be performed to overcome this issue by taking advantage of known variations that have no impact on the label of an example. Typical solutions in computer vision and document analysis and recognition are based on geometric transformations (e.g. shift and rotation) and random elastic deformations of the original training examples. In this paper, we consider Generative Adversarial Networks (GAN), a technique that does not require prior knowledge of the possible variabilities that exist across examples to create novel artificial examples. In the case of a training dataset with a low number of labeled examples, which are described in a high dimensional space, the classifier may generalize poorly. Therefore, we aim at enriching databases of images or signals for improving the classifier performance by designing a GAN for creating artificial images. While adding more images through a GAN can help, the extent to which it will help is unknown, and it may degrade the performance if too many artificial images are added. The approach is tested on four datasets on handwritten digits (Latin, Bangla, Devanagri, and Oriya). The accuracy for each dataset shows that the addition of GAN generated images in the training dataset provides an improvement of the accuracy. However, the results suggest that the addition of too many GAN generated images deteriorates the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Baird H (1990) Document image defect models. In: Proc. of the IAPR workshop on syntactic and structural pattern recognition, pp 38–46

  2. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522

    Article  Google Scholar 

  3. Bhattacharya U, Chaudhuri B (2005) Databases for research on recognition of handwritten characters of indian scripts. In: Proc. of the 8th int. conf. on document analysis and recognition (ICDAR’05), pp 789–793

  4. Bhowmick T, Parui S, Bhattacharya U, Shaw B (2006) An HMM based recognition scheme for handwritten oriya numerals. In: Proc. of the 9th int. conf. on information technology (ICIT 2006), pp 105–110

  5. Chaudhuri BB, Pal U (1998) A complete printed Bangla OCR system. Pattern Recogn 31:531–549

    Article  Google Scholar 

  6. Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR), pp 3642–3649

  7. Dieleman S, Willett KW, Dambre J (2015) Rotation-invariant convolutional neural networks for galaxy morphology prediction. Mon Not R Astron Soc 450:1441–1459

    Article  Google Scholar 

  8. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118

    Article  Google Scholar 

  9. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  10. Guha R, Das N, Kundu M, Nasipuri M, Santosh KC Devnet: an efficient CNN architecture for handwritten devanagari character recognition. International Journal of Pattern Recognition and Artificial Intelligence (2019). https://doi.org/10.1142/S0218001420520096

  11. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  12. Kamble PM, Hegadi RS (2015) Handwritten marathi character recognition using R-HOG feature. Procedia Comput Sci 45:266–274

    Article  Google Scholar 

  13. Kamble PM, Hegadi RS (2016) Comparative study of handwritten marathi characters recognition based on KNN and SVM classifier. In: Int. Conf. on recent trends in image processing and pattern recognition, pp 93–101

  14. Kamble PM, Hegadi RS (2017) Deep neural network for handwritten marathi character recognition. Int J Imag Robot 17(1):95–107

    Google Scholar 

  15. Keysers D, Deselaers T, Gollan C, Ney H (2007) Deformation models for image recognition. IEEE Trans Pattern Anal Machs Intell 29(8):1422–1435

    Article  Google Scholar 

  16. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proc. advances in neural information processing systems, vol 25, pp 1090–1098

  17. Kupyn O, Budzan V, Mykhailych M, Mishkin D, Matas J (2018) Deblurgan: blind motion deblurring using conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8183–8192

  18. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  19. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444

    Article  Google Scholar 

  20. Leung MK, Xiong HY, Lee LJ, Frey BJ (2014) Deep learning of the tissue-regulated splicing code. Bioinformatics 30:i121–i129

    Article  Google Scholar 

  21. Li W, Gauci M, Groß R (2016) Turing learning: a metric-free approach to inferring behavior and its application to swarms. Swarm Intell 10(3):211–243

    Article  Google Scholar 

  22. Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are gans created equal? A large-scale study. In: Advances in neural information processing systems, pp 700–709

  23. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55:263–274

    Article  Google Scholar 

  24. Obaidullah SM, Santosh KC, Goncalves T, Das N, Roy K (eds) (2019) Document processing using machine learning. CRC Press, Boca Raton, FL, USA

  25. Pal U, Chaudhuri BB (2004) Indian script character recognition: a survey. Pattern Recogn 37(9):1887–1899

    Article  Google Scholar 

  26. Pardeshi R, Chaudhuri BB, Hangarge M, Santosh KC (2014) Automatic handwritten indian scripts identification. In: Proc. of the 14th international conference on frontiers in handwriting recognition, pp 375–380

  27. Razali NM, Wah YB, et al. (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Stat Model Anal 2(1):21–33

    Google Scholar 

  28. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. arXiv:https://arxiv.org/abs/1605.05396

  29. Santosh KC, Lamiroy B, Wendling L (2012) Symbol recognition using spatial relations. Pattern Recogn Lett 33:331–341

    Article  Google Scholar 

  30. Schawinski K, Zhang C, Zhang H, Fowler L, Santhanam GK (2017) Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit. Mon Not R Astronom Soc: Lett 467(1):L110–L114

    Google Scholar 

  31. Schmidhuber J (1992) Learning factorial codes by predictability minimization. Neural Comput 4(6):863–879

    Article  Google Scholar 

  32. Simard P, Victorri B, LeCun Y, Denker J (1991) Tangent prop - a formalism for specifying selected invariances in an adaptive network. In: Moody RPLEJE, Hanson SJ (eds) Advances in neural information processing systems, pp 895–903

  33. Simard P, Steinkraus D, Platt J (2003) Best practices for convolutional neural networks applied to visual document analysis. In: Proc. of the 7th int. conf. document analysis and recognition (ICDAR), pp 958–962

  34. Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Da N Improved word-level handwritten indic script identification by integrating small convolutional neural networks. Neural Computing and Applications (2019). https://doi.org/10.1007/s00521-019-04111-1

  35. Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hubert Cecotti.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jha, G., Cecotti, H. Data augmentation for handwritten digit recognition using generative adversarial networks. Multimed Tools Appl 79, 35055–35068 (2020). https://doi.org/10.1007/s11042-020-08883-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08883-w

Keywords

Navigation