Saliency-based selection of visual content for deep convolutional neural networks

Obeso, A. Montoya; Benois-Pineau, J.; Vázquez, M. S. García; Acosta, A. A. Ramírez

doi:10.1007/s11042-018-6515-2

Saliency-based selection of visual content for deep convolutional neural networks

Application to architectural style classification

Published: 25 August 2018

Volume 78, pages 9553–9576, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

A. Montoya Obeso ORCID: orcid.org/0000-0001-7090-1048^1,2,
J. Benois-Pineau²,
M. S. García Vázquez¹ &
…
A. A. Ramírez Acosta³

611 Accesses
Explore all metrics

Abstract

The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Adaptive Image Representation Using Information Gain and Saliency: Application to Cultural Heritage Datasets

Pixel Saliency Based Encoding for Fine-Grained Image Classification

Deep Convolutional Network Based Saliency Prediction for Retrieval of Natural Images

References

Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Article Google Scholar
Ali H, Seifert C, Jindal N, Paletta L, Paar G (2007) Window detection in facades. In: 2007 14th international conference on image analysis and processing, ICIAP 2007. IEEE, pp 837–842
Benois-Pineau J, Callet PL (eds) (2017) Visual content indexing and retrieval with psychovisual models. Springer, Heidelberg
Benois-Pineau J, Mitrea M (2017) Extraction of saliency in images and video: Problems, methods and applications. A survey. In: 2017 Seventh international conference on image processing theory, tools and applications (IPTA). IEEE, Montreal, Canada. https://doi.org/10.1109/IPTA.2017.8310116. https://hal.archives-ouvertes.fr/hal-01766387
Berg AC, Grabler F, Malik J (2007) Parsing images of architectural scenes. In: IEEE 11th international conference on 2007 computer vision, ICCV 2007. IEEE, pp 1–8
Bhowmik N, Gouet-Brunet V, Bloch G, Besson S (2017) Combination of image descriptors for the exploration of cultural photographic collections. J Electron Imag 26(1):011,019–011,019
Article Google Scholar
Buso V, González-díaz I, Benois-Pineau J (2015) Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos. Sig Proc Image Commun 39:418–431. https://doi.org/10.1016/j.image.2015.05.006
Article Google Scholar
Buswell GT (1935) How people look at pictures: a study of the psychology and perception in art
Bylinskii Z, Recasens A, Borji A, Oliva A, Torralba A, Durand F (2016) Where should saliency models look next?. In: European conference on computer vision. Springer, pp 809–824
de Carvalho Soares R, da Silva IR, Guliato D (2012) Spatial locality weighting of features using saliency map with a bag-of-visual-words approach. In: 2012 IEEE 24th international conference on tools with artificial intelligence (ICTAI), vol 1. IEEE, pp 1070–1075
De San Roman PP, Benois-Pineau J, Domenger JP, De Rugy A, Paclet F, Cataert D (2017) Saliency driven object recognition in egocentric videos with deep cnn: toward application in assistance to neuroprostheses Computer Vision and Image Understanding
Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2017) Deepproposals: hunting objects and actions by cascading deep convolutional layers. Int J Comput Vis 124(2):115–131. https://doi.org/10.1007/s11263-017-1006-x
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
González-Díaz I, Buso V, Benois-Pineau J (2016) Perceptual modeling in the problem of active object recognition in visual scenes. Pattern Recogn 56:129–141
Article Google Scholar
GPU NDIDL (2015) Training system
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems, pp 545–552
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Howard AG (2013) Some improvements on deep convolutional neural network based image classification. arXiv:1312.5402
Itti L, Koch C (2001) Computational modelling of visual attention. Nature Rev Neuroscience 2(3):194
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Liu Z, Wang J, Liu W (2005) Building extraction from high resolution imagery based on multi-scale object oriented classification and probabilistic hough transform. In: 2005 Proceedings of the IEEE international geoscience and remote sensing symposium, 2005. IGARSS’05, vol 4. IEEE, pp 2250–2253
Llamas J, Lerones PM, Zalama E, Gómez-garcía-bermejo J (2016) Applying deep learning techniques to cultural heritage images within the inception project. In: Euro-mediterranean conference. Springer, pp 25–32
Mahadevan V, Vasconcelos N (2013) Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Trans Pattern Anal Mach Intell 35 (3):541–554
Article Google Scholar
Mathe S, Sminchisescu C (2012) Dynamic eye movement datasets and learnt saliency models for visual action recognition. Computer Vision–ECCV 2012:842–856
Google Scholar
Mathias M, Martinovic A, Weissenberg J, Haegler S, Van Gool L (2011) Automatic architectural style recognition. ISPRS-international archives of the photogrammetry. Remote Sens Spatial Inform Sci 3816:171–176
Google Scholar
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o (1/k2). In: Soviet mathematics Doklady, vol 27, pp 372–376
Obeso AM, Benois-Pineau J, Acosta AAR, Vázquez MSG (2016) Architectural style classification of mexican historical buildings using deep convolutional neural networks and sparse features. J Electron Imag 26(1):011,016. https://doi.org/10.1117/1.JEI.26.1.011016
Article Google Scholar
Obeso AM, Reyes LMA, Rodriguez ML, Cruz MHM, Vázquez MSG, Benois-Pineau J, Fuentes LMZ, Martinez EC, Secundino JAF, Martinez JLR et al (2016) Image annotation for mexican buildings database. In: International society for optics and photonics of the SPIE optical engineering+ applications, pp 99,700y–99,700y
Obeso AM, Vázquez MSG, Acosta AAR, Benois-Pineau J (2017) Connoisseur: classification of styles of mexican architectural heritage with deep learning and visual attention prediction. In: Proceedings of the 15th international workshop on content-based multimedia indexing, vol 16. ACM
Papushoy A, Bors AG (2015) Image retrieval based on query by saliency content. Digital Signal Process 36:156–173
Article MathSciNet Google Scholar
Pont-Tuset J, Arbeláez P, Barron JT, Marques F, Malik J (2017) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans Pattern Anal Mach Intell 39(1):128–140. https://doi.org/10.1109/TPAMI.2016.2537320
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Ren X, Gu C (2010) Figure-ground segmentation improves handled object recognition in egocentric video. In: 2010 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 3137–3144
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
San Biagio M, Bazzani L, Cristani M, Murino V (2014) Weighted bag of visual words for object recognition. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 2734–2738
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR arXiv:1312.6229
Shalunts G (2015) Architectural style classification of building facade towers. In: International symposium on visual computing. Springer, pp 285–294
Shalunts G, Haxhimusa Y, Sablatnig R (2011) Architectural style classification of building facade windows. In: International symposium on visual computing. Springer, pp 280–289
Shalunts G, Haxhimusa Y, Sablatnig R (2012) Classification of gothic and baroque architectural elements. In: 2012 19th international conference on systems, signals and image processing (IWSSIP). IEEE, pp 316–319
Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3506–3513
Sikora T, Makai B (1995) Shape-adaptive dct for generic coding of video. IEEE Trans Circuit Syst Video Technol 5(1):59–62
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Su Y, Zhao Q, Zhao L, Gu D (2014) Abrupt motion tracking using a visual saliency embedded particle filter. Pattern Recogn 47(5):1826–1834
Article Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp 1139–1147
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. https://doi.org/10.1007/s11263-013-0620-5
Article Google Scholar
Viola PA, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: 2001 IEEE computer society conference on computer vision and pattern recognition (CVPR 2001), with CD-ROM, 8-14 December 2001, Kauai, HI, USA, pp 511–518. https://doi.org/10.1109/CVPR.2001.990517
Wang Q, Yuan Y, Yan P (2013) Visual saliency by selective contrast. IEEE Trans Circuit Syst Video Technol 23(7):1150–1155
Article Google Scholar
Wang Q, Yuan Y, Yan P, Li X (2013) Saliency detection by multiple-instance learning. IEEE Trans Cybern 43(2):660–672
Article Google Scholar
Xu Z, Tao D, Zhang Y, Wu J, Tsoi AC (2014) Architectural style classification using multinomial latent logistic regression. In: European conference on computer vision. Springer, pp 600–615
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhang B, Song Y, Guan SU, Zhang Y (2010) Historic chinese architectures image retrieval by svm and pyramid histogram of oriented gradients features. Int J Soft Comput 5(2):19–28
Article Google Scholar

Download references

Acknowledgements

This work was sponsored by CONACYT and SIP2017.

Author information

Authors and Affiliations

Instituto Politécnico Nacional, Ciudad de México, México
A. Montoya Obeso & M. S. García Vázquez
Université de Bordeaux, Bordeaux, France
A. Montoya Obeso & J. Benois-Pineau
MIRAL R&D&I, San Diego, CA, USA
A. A. Ramírez Acosta

Authors

A. Montoya Obeso
View author publications
You can also search for this author in PubMed Google Scholar
J. Benois-Pineau
View author publications
You can also search for this author in PubMed Google Scholar
M. S. García Vázquez
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Ramírez Acosta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Montoya Obeso.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Obeso, A.M., Benois-Pineau, J., Vázquez, M.S.G. et al. Saliency-based selection of visual content for deep convolutional neural networks. Multimed Tools Appl 78, 9553–9576 (2019). https://doi.org/10.1007/s11042-018-6515-2

Download citation

Received: 29 November 2017
Revised: 22 June 2018
Accepted: 10 August 2018
Published: 25 August 2018
Issue Date: April 2019
DOI: https://doi.org/10.1007/s11042-018-6515-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Saliency-based selection of visual content for deep convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Image Representation Using Information Gain and Saliency: Application to Cultural Heritage Datasets

Pixel Saliency Based Encoding for Fine-Grained Image Classification

Deep Convolutional Network Based Saliency Prediction for Retrieval of Natural Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Saliency-based selection of visual content for deep convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Image Representation Using Information Gain and Saliency: Application to Cultural Heritage Datasets

Pixel Saliency Based Encoding for Fine-Grained Image Classification

Deep Convolutional Network Based Saliency Prediction for Retrieval of Natural Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation