Abstract
We present a hybrid model named Glance and Glimpse Network (GGNet) for visual classification, which includes an attention-based recurrent neural network (Glimpse Network) and a convolutional neural network (Glance Network). The Glimpse Network is trained to deploy a sequence of glimpses at different image patches and then output classification results. On the other hand, the Glance Network is designed to take the downsampled input image and generates an image-specific class saliency map to provide hints for training the Glimpse Network. We show that training the Glimpse network with such cues can be interpreted under both frameworks of probabilistic inference and reinforcement learning, therefore establishing high-level connections between these two separate fields. We evaluate the performance of our model on Cluttered Translated MNIST benchmark datasets and show that the GGNet can achieve the state-of-the-art results compared to other recently proposed attention models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems (NIPS), pp. 2204–2212 (2014)
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. In: International Conference on Learning Representations (ICLR) (2015)
Ba, J., Salakhutdinov, R.R., Grosse, R.B., Frey, B.J.: Learning wake-sleep recurrent attention models. In: Advances in Neural Information Processing Systems (NIPS), pp. 2575–2583 (2015)
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning (ICML), pp. 1462–1471 (2015)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (ICML), pp. 2048–2057 (2015)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2008–2016 (2015)
Moczulski, M., Xu, K., Courville, A., Cho, K.: A controller recognizer framework: how necessary is recognition for control? In: International Conference for Learning Representations (ICLR) Workshops (2016)
Chun, M.M., Jiang, Y.: Contextual cueing: implicit learning and memory of visual context guides spatial attention. Cogn. Psychol. 36, 28–71 (1998)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: International Conference on Computer Vision (ICCV), pp. 2106–2113 (2009)
He, H., Ge, S.S., Zhang, Z.: A saliency-driven robotic head with bio-inspired saccadic behaviors for social robotics. Auton. Robot. 36, 225–240 (2014)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: International Conference on Learning Representations (ICLR) Workshops (2014)
Melchers, R.: Importance sampling in structural systems. Struct. Saf. 6, 3–10 (1989)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
Weber, T., Heess, N., Eslami, A., Schulman, J., Wingate, D., Silver, D.: Reinforced variational inference. In: Advances in Neural Information Processing Systems (NIPS) Workshops (2015)
Tang, Y., Salakhutdinov, R.R.: Learning stochastic feedforward neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 530–538 (2013)
Mnih, A., Rezende, D.J.: Variational inference for monte carlo objectives. In: International Conference on Machine Learning (ICML) (2016)
Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: International Conference on Machine Learning (ICML) (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Acknowledgement
This work is supported by the A*STAR Industrial Robotics Program of Singapore, under grant number R-261-506-007-305.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, M., Ge, S.S., Lee, T.H. (2017). Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-54526-4_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54525-7
Online ISBN: 978-3-319-54526-4
eBook Packages: Computer ScienceComputer Science (R0)