Skip to main content
Log in

Invariant object recognition based on combination of sparse DBN and SOM with temporal trace rule

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a trace rule based self-organized map (SOM) model built upon a sparse 2-stage deep belief network (DBN). The combination of SOM and sparse DBN forms a hierarchical network where DBN serves as a V2 features detector while SOM layer learns to extract transformation invariant features guided by trace learning rule during training phase. The performance of our proposed method is evaluated by stimulus specific information (SSI) measuring and comparison with classic algorithms. It is demonstrated that trace rule based SOM model can generate more neurons with high SSI value which is beneficial to convey more useful and discriminative information for further object recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bell AJ, Sejnowski TJ (1997) The independent components of natural scenes are edge filters. Vis Res 37:3327–3338

    Article  Google Scholar 

  2. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Advances in neural information processing systems

  3. Bengior Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

  4. Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning. J Mach Learn Res 15:215–223

    Google Scholar 

  5. Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1):103–112

    Article  Google Scholar 

  6. Hateren JHV, Schaaf AVD (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proc Royal Soc Biol Sci 265(1394):359–66

    Article  Google Scholar 

  7. Hinton GE, Salakhutdinov R (2006) Reducing the dimensiionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  8. Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition?. In: ICCV, vol 30, pp 2146–2153

  9. Ji N, Zhang J, Zhang C, Yin Q (2014) Enhancing performance of restricted boltzmann machines via log-sum regularization. Knowl-Based Syst 63:82–96

    Article  Google Scholar 

  10. Kohonen T (1981) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69

    Article  MATH  Google Scholar 

  11. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25(2):2012

    Google Scholar 

  12. LeCun Y, kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. IEEE Int Symp Circuits Syst 14(5):253–256

    Google Scholar 

  13. Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: NIPS, pp 801–808

  14. Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area v2. Adv Neural Inf Proces Syst 20:873–880

    Google Scholar 

  15. Liu M, Zhang D (2016) Pairwise constraint-guided sparse learning for feature selection. IEEE Trans Cybern 46.1:298–310

    Article  Google Scholar 

  16. Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609

    Article  Google Scholar 

  17. Robinson L, Rolls ET (2015) Invariant visual object recognition: biologically plausible. Biol Cybern 109:505–535

    Article  MathSciNet  Google Scholar 

  18. Rolls ET (2012) Invariant visual object and face recognition: neural and computational bases, and a model. VisNet Front Comput Neurosci 6(35):1–70

    Google Scholar 

  19. Rolls ET, Treves A (2011) The neuronal encoding of information in the brain. Prog Neurobiol 95:448–490

    Article  Google Scholar 

  20. Rolls ET, Treves A, Tovee MJ, Panzeri S (1997) Information in the neuronal representation of individual stimuli in the primate temporal visual cortex. J Comput Neurosci 4:309–333

    Article  MATH  Google Scholar 

  21. Socher R, Huval B, Bhat B, Manning CD, Ng AY (2012) Convolutional-recursive deep learning for 3d object classification. In: NIPS, pp 665–673

  22. Szegedy C, Liu W, Jia Y (2015) Going deeper with convolutions. In: CVPR, pp 1–9

  23. Wallis G, Rolls ET (1996) A model of invariant object recognition in the visual system. Prog Neurobiol 51:167–194

    Article  Google Scholar 

  24. Yang J, Yu K, Gong Y, Huang TS (2009) Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp 1794–1801

  25. Zhang J, Liang J, Zhao H (2013) Local energy pattern for texture classification using self-adaptive quantizatiion thresholds. IEEE Trans Image Process 22.1:31–42

    Article  Google Scholar 

  26. Zhang J, Zhao H, Liang J (2013) Continuous rotation invariant local descriptors for texton dictionary-based texture classification. Comput Vis Image Underst 117.1:56–75

    Article  Google Scholar 

  27. Zhang J, Liang J, Zhang C, Zhao H (2015) Scale invariant texture representation based on frequency decomposition and gradient orientation. Pattern Recogn Lett 51:57–62

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Project of National Natural Science Foundation of China (Grant Nos. 61076097, 61473257).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shulong Wang.

Appendix

Appendix

Assume that P(t) = 1/S, where S is the number of objects, we can derive the possible maximum SSI of a neuron by:

$$ P(r)=\sum\limits_{t\in S}P(t)P(r\mid t)=\frac{1}{S}\sum\limits_{t\in S}P(r\mid t) $$
(11)
$$\begin{array}{@{}rcl@{}} I(s,R)&=&\sum\limits_{r\in R}P(r\mid s)\log_{2}\frac{P(r\mid s)}{P(r)} \end{array} $$
(12)
$$\begin{array}{@{}rcl@{}} &=&\sum\limits_{r\in R}P(r\mid s)\log_{2}\frac{S\cdot P(r\mid s)}{\sum\limits_{t\in S}P(r\mid t)} \end{array} $$
(13)
$$\begin{array}{@{}rcl@{}} &=&\sum\limits_{r\in R}P(r\mid s)\left[ \log_{2}S+\log_{2}\frac{P(r\mid s)}{\sum\limits_{t\in S}P(r\mid t)} \right] \end{array} $$
(14)
$$\begin{array}{@{}rcl@{}} &=&\log_{2}S+\sum\limits_{r\in R}P(r\mid s)\log_{2}\frac{P(r\mid s)}{\sum\limits_{t\in S}P(r\mid t)} \end{array} $$
(15)
$$\begin{array}{@{}rcl@{}} &\because& 0\leq P(r\mid s)\leq \sum\limits_{t\in S}P(r\mid t) \end{array} $$
(16)
$$\begin{array}{@{}rcl@{}} &\therefore& \log_{2}\frac{P(r\mid s)}{{\sum}_{t\in S}P(r\mid t)}\leq 0 \end{array} $$
(17)
$$\begin{array}{@{}rcl@{}} &\therefore& \sum\limits_{r\in R}P(r\mid s)\log_{2}\frac{P(r\mid s)}{{\sum}_{t\in S}P(r\mid t)}\leq 0 \end{array} $$
(18)
$$\begin{array}{@{}rcl@{}} &\therefore& I(s,R)\leq \log_{2}S \end{array} $$
(19)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, H., Wang, S., Liu, E. et al. Invariant object recognition based on combination of sparse DBN and SOM with temporal trace rule. Multimed Tools Appl 76, 12017–12034 (2017). https://doi.org/10.1007/s11042-016-3956-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3956-3

Keywords

Navigation