Deep learning algorithm with visual impression

https://doi.org/10.1016/j.ipl.2018.03.004Get rights and content

Highlights

  • Develop deep neural networks to learn the visual impression during training the source dataset.

  • Reuse the hidden layer parameters in the source network to help the recognition process of the target task.

  • Two designed models which transfer the visual impression can largely reduce the number of annotated samples we need.

Abstract

In this article, we develop two visual impression models: recognition model and generalization model to simulate the cognition process of human visual systems. We show how the visual impression learned with a deep neural network can be efficiently transferred to other visual recognition tasks. By reusing the hidden layers trained in an unsupervised way, we show that we can largely reduce the number of annotated image samples in the target tasks. Experiments show that parameters estimated in the source task can indeed help the network to improve results for object classification in the target tasks.

Introduction

The effectiveness of human visual system has long been studying to uncover the mankind's cognitive patterns. As neurophysiology and cognitive psychology develops, there are more and more domestic and foreign scholars combining visual perception with mechanisms of cognition to simulate the processing information model of human brains [1]. Inspired by the concept of “impression” in social cognition [2], [3], this article presents a novel explanation of human object recognition.

Impression is an important form of cognition. From an intuitive understanding, impression is the figure about cognitive objects in our brains [4]. When exposed to the rapidly changing world, human brains are filled with information from all the sensory organs. This information is changed to various specific patterns via cognition, and they are stored in brains in the shape of memories, which can help human understand man's life-world [5]. These spatial-temporal models that remained in the human brain and help people recognize things constitute impressions.

Human brains are widely believed the most intelligent and efficient information processing systems. Why they can guarantee completing various complex tasks in such an effective way? The answer to the question is that in most of the time, our brains take out results from the existing memories rather than compute the result temporarily [6]. This is the role that impression mechanism plays in the human cognitive process. Impression is an integrated cognition. Human can obtain different kind of impressions with the help of different sensory organs, i.e. visual impression, acoustical impression, olfactory impression, and gustatory impression. Statistics show that during the human perception process, 80 percent of cognitive information comes from vision [7]. Among numerous impressions formed by human brains, visual impression plays an important role in the cognition process. Therefore, visual impression is the emphasis of our research.

Section snippets

Visual impression

When people are observing object things, the stimulation on human visual system forms certain visual information and then is stored in memories. The visual information preserved in memories is reflected as the visual impression in the human brains. When people once again observe the same situation, they can quickly find the past visual impression, which help the recognition process of the current thing. Even though they meet the object things that never see, they can make a certain

Visual impression model

In computer vision area, recognition tasks have been paid close extensive attention in a very long time. Deep learning has showed outstanding image classification performance in the large scale recognition challenges. With respect to handcrafted low-level features, multiple models of deep learning have showed the advantages of learning abundant mid-level features. However, deep learning techniques need to evaluate millions of parameters for the final recognition process. And learning such a

Experiments

In the experiments of visual impression model, we use the well-known MNIST digits dataset and MNIST variations for the recognition model, and CIFAR-10 dataset for the generalization model. Each image in MNIST is 28×28 gray-scale pixel values scaled to [0,1]. We used the usual split of 50000 examples for training, 10000 for validation, and 10000 for test. In addition to MNIST, we used some of its variants, namely MNIST-rot (the digits were rotated by an angle generated uniformly between 0 and 2π

Conclusion

Deep learning models have recently shown outstanding performance in the recognition challenges. However, the requirement of large-scale annotated datasets has become an intractable problem of deep learning techniques. In this paper, we show how image representations learned with sparse auto-encoder on a very large number of annotated image samples can be efficiently transferred to new visual recognition tasks with limited amount of training data. We design a new network structure to reuse the

Acknowledgements

This work was supported by National Nature Science Foundation of China [61672364, 61672365 and 61033013].

References (8)

  • J.J. DiCarlo et al.

    How does the brain solve visual object recognition

    Neuron

    (2012)
  • R.S. Wyer

    The Automaticity of Everyday Life: Advances in Social Cognition

    (2014)
  • S.T. Fiske et al.

    Social Cognition: From Brains to Culture

    (2013)
  • S. Ullman

    High-Level Vision: Object Recognition and Visual Cognition

    (1996)
There are more references available in the full text version of this article.

Cited by (3)

View full text