Skip to main content
Log in

Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning

  • Research Report
  • Published:
Cognitive Processing Aims and scope Submit manuscript

Abstract

In humans, efficient recognition of written symbols is thought to rely on a hierarchical processing system, where simple features are progressively combined into more abstract, high-level representations. Here, we present a computational model of Persian character recognition based on deep belief networks, where increasingly more complex visual features emerge in a completely unsupervised manner by fitting a hierarchical generative model to the sensory data. Crucially, high-level internal representations emerging from unsupervised deep learning can be easily read out by a linear classifier, achieving state-of-the-art recognition accuracy. Furthermore, we tested the hypothesis that handwritten digits and letters share many common visual features: A generative model that captures the statistical structure of the letters distribution should therefore also support the recognition of written digits. To this aim, deep networks trained on Persian letters were used to build high-level representations of Persian digits, which were indeed read out with high accuracy. Our simulations show that complex visual features, such as those mediating the identification of Persian symbols, can emerge from unsupervised learning in multilayered neural networks and can support knowledge transfer across related domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. The complete letter dataset can be downloaded from http://farsiocr.ir.

  2. http://ccnl.psy.unipd.it/research/deeplearning.

References

  • Ackley D, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cogn Sci 9:147–169. doi:10.1016/S0364-0213(85)80012-4

    Article  Google Scholar 

  • Alaei A, Nagabhushan P, Pal U (2009) Fine classification of unconstrained handwritten Persian/Arabic numerals by removing confusion amongst similar classes. In: 10th International conference on document analysis and recognition. pp 601–605. doi:10.1109/ICDAR.2009.181

  • Alaei A, Nagabhushan P, Pal U (2010) A new two-stage scheme for the recognition of Persian handwritten characters. In: Proceedings—12th international conference on frontiers handwriting recognition, ICFHR 2010. pp 130–135. doi:10.1109/ICFHR.2010.27

  • Alaei A, Pal U, Nagabhushan P (2012) A comparative study of Persian/Arabic handwritten character recognition. In: 2012 International conference on frontiers handwriting recognition. pp 123–128. doi:10.1109/ICFHR.2012.152

  • Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc., Breda

    Google Scholar 

  • Bengio Y (2011) Deep learning of representations for unsupervised and transfer learning. In: International conference on machine learning. pp 1–20

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828

    Article  PubMed  Google Scholar 

  • Borji A, Hamidi M, Mahmoudi F (2008) Robust handwritten character recognition with features inspired by visual ventral stream. Neural Process Lett 28:97–111. doi:10.1007/s11063-008-9084-y

    Article  Google Scholar 

  • Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge

    Book  Google Scholar 

  • Ciresan D, Schmidhuber J (2015) Multi-column deep neural networks for offline handwritten Chinese character classification. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–6

  • Ciresan D, Meier U, Schmidhuber J (2012) Transfer learning for Latin and Chinese characters with deep neural networks. In: International joint conference on neural networks

  • Clark A (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav Brain Sci 36:181–204. doi:10.1017/S0140525X12000477

    Article  PubMed  Google Scholar 

  • Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: International conference on machine learning

  • Cox DD, Dean T (2014) Neural networks and neuroscience-inspired computer vision. Curr Biol 24:R921–R929. doi:10.1016/j.cub.2014.08.026

    Article  CAS  PubMed  Google Scholar 

  • Dehaene S, Cohen L (2007) Cultural recycling of cortical maps. Neuron 56:384–398. doi:10.1016/j.neuron.2007.10.004

    Article  CAS  PubMed  Google Scholar 

  • Dehaene S, Cohen L, Sigman M, Vinckier F (2005) The neural code for written words: a proposal. Trends Cogn Sci 9:335–341. doi:10.1016/j.tics.2005.05.004

    Article  PubMed  Google Scholar 

  • Dehaene S, Pegado F, Braga LW et al (2010) How learning to read changes the cortical networks for vision and language. Science 330(80):1359–1364. doi:10.1126/science.1194140

    Article  CAS  PubMed  Google Scholar 

  • DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 73:415–434

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ebrahimpour R, Esmkhani A, Faridi S (2010) Farsi handwritten digit recognition based on mixture of RBF experts. IEICE Electron Express 7:1014–1019. doi:10.1587/elex.7.1014

    Article  Google Scholar 

  • Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1:1–47

    Article  CAS  PubMed  Google Scholar 

  • Finkbeiner M, Coltheart M (2009) Letter recognition: from perception to representation. Cogn Neuropsychol 26:1–6. doi:10.1080/02643290902905294

    Article  PubMed  Google Scholar 

  • Fukushima K (1988) Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw 1:119–130

    Article  Google Scholar 

  • Ghods V, Kabir E (2010) Feature extraction for online Farsi characters. In: 12th International conference on frontiers handwriting recognition. pp 477–482. doi:10.1109/ICFHR.2010.81

  • Grainger J, Rey A, Dufau S (2008) Letter perception: from pixels to pandemonium. Trends Cogn Sci 12:381–387. doi:10.1016/j.tics.2008.06.006

    Article  PubMed  Google Scholar 

  • Grainger J, Dufau S, Ziegler JC (2016) A vision of reading. Trends Cogn Sci 1529:1–9. doi:10.1016/j.tics.2015.12.008

    Google Scholar 

  • Hamidi M, Borji A (2009) Invariance analysis of modified C2 features: case study—handwritten digit recognition. Mach Vis Appl 21:969–979. doi:10.1007/s00138-009-0216-9

    Article  Google Scholar 

  • Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800

    Article  PubMed  Google Scholar 

  • Hinton GE (2007) Learning multiple layers of representation. Trends Cogn Sci 11:428–434

    Article  PubMed  Google Scholar 

  • Hinton GE (2010) A practical guide to training restricted Boltzmann machines. Technical reports UTML TR 2010-003, Univ Toronto 9:1

  • Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(80):504–507. doi:10.1126/science.1127647

    Article  CAS  PubMed  Google Scholar 

  • Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Article  PubMed  Google Scholar 

  • Kaushanskaya M, Marian V (2009) The bilingual advantage in novel word learning. Psychon Bull Rev 16:705–710

    Article  PubMed  Google Scholar 

  • Khosravi H, Kabir E (2007) Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern Recognit Lett 28:1133–1141. doi:10.1016/j.patrec.2006.12.022

    Article  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 24:609–616

    Google Scholar 

  • Kruger N, Janssen P, Kalkan S et al (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35:1847–1871. doi:10.1109/TPAMI.2012.272

    Article  PubMed  Google Scholar 

  • Le QV, Ranzato MA, Monga R et al (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning, Edinburgh

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791

  • LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521:436–444. doi:10.1038/nature14539

    Article  CAS  PubMed  Google Scholar 

  • Mohamed A, Dahl GE, Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20:14–22. doi:10.1109/TASL.2011.2109382

    Article  Google Scholar 

  • Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359

    Article  Google Scholar 

  • Parvez MT, Mahmoud SA (2013) Offline arabic handwritten text recognition: a survey. ACM Comput Surv 45:23:1–23:35. doi:10.1145/2431211.2431222

    Article  Google Scholar 

  • Raina R, Battle A, Lee H et al (2007) Self-taught learning: transfer learning from unlabeled data. In: International conference on machine learning. pp 759–766

  • Sadeghi Z (2016) Deep learning and developmental learning: emergence of fine-to-coarse conceptual categories at layers of deep belief network. Perception 45:1036–1045. doi:10.1177/0301006616651950

    Article  PubMed  Google Scholar 

  • Salimi H, Giveki D (2012) Farsi/Arabic handwritten digit recognition based on ensemble of SVD classifiers and reliable multi-phase PSO combination rule. Int J Doc Anal Recognit 16:371–386. doi:10.1007/s10032-012-0195-7

    Article  Google Scholar 

  • Sigaud O, Droniou A (2015) Towards deep developmental learning. IEEE Trans Auton Ment Dev 33:1–16. doi:10.1109/TAMD.2015.2496248

    Google Scholar 

  • Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Annu Rev Neurosci 24:1193–1216

    Article  CAS  PubMed  Google Scholar 

  • Stoianov I, Zorzi M (2012) Emergence of a “visual number sense” in hierarchical generative models. Nat Neurosci 15:194–196. doi:10.1038/nn.2996

    Article  CAS  PubMed  Google Scholar 

  • Testolin A, Zorzi M (2016) Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front Comput Neurosci. doi:10.3389/fncom.2016.00073

    PubMed  PubMed Central  Google Scholar 

  • Testolin A, Stoianov I, De Filippo De Grazia M, Zorzi M (2013) Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front Psychol 4:251

    Article  PubMed  PubMed Central  Google Scholar 

  • Testolin A, Stoianov I, Sperduti A, Zorzi M (2016) Learning orthographic structure with sequential generative neural networks. Cogn Sci 40:579–606

    Article  PubMed  Google Scholar 

  • Testolin A, Stoianov I, Zorzi M (2017) Letter perception emerges from unsupervised deep learning and recycling of natural image features (under review)

  • Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999

    Article  CAS  PubMed  Google Scholar 

  • Vinckier F, Dehaene S, Jobert A et al (2007) Hierarchical coding of letter strings in the ventral stream: dissecting the inner organization of the visual word-form system. Neuron 55:143–156. doi:10.1016/j.neuron.2007.05.031

    Article  CAS  PubMed  Google Scholar 

  • Widrow B, Hoff M (1960) Adaptive switching circuits. In: IRE WESCON convention record. pp 96–140

  • Wiley RW, Wilson C, Rapp B (2016) The effects of alphabet and expertise on letter perception. J Exp Psychol Hum Percept Perform 42:1186–1203. doi:10.1037/xhp0000213

    Article  PubMed  Google Scholar 

  • Zorzi M, Testolin A, Stoianov I (2013) Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front Psychol 4:515. doi:10.3389/fpsyg.2013.00515

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was partially supported through a Grant to A.T. from the Italian Ministry of Research. Part of this research was performed, while both authors were visiting the Parallel Distributed Processing Lab at Stanford University, California, USA. Computing resources were provided by the Stanford Center for Mind, Brain and Computation. The authors warmly thank Prof. Jay McClelland for financial support and for making it possible to access Stanford MBC resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Testolin.

Additional information

Handling Editor: John K. Tsotsos (York University); Reviewers: Mahdi Biparva (York University), Alireza Alaei (Griffith University).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 233 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sadeghi, Z., Testolin, A. Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning. Cogn Process 18, 273–284 (2017). https://doi.org/10.1007/s10339-017-0796-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-017-0796-7

Keywords

Navigation