AR-CNN: an attention ranking network for learning urban perception

Li, Zhetao; Chen, Ziwen; Zheng, Wei-Shi; Oh, Sangyoon; Nguyen, Kien

doi:10.1007/s11432-019-2899-9

AR-CNN: an attention ranking network for learning urban perception

Research Paper
Published: 24 December 2021

Volume 65, article number 112104, (2022)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Zhetao Li^1,2,
Ziwen Chen^1,2,
Wei-Shi Zheng³,
Sangyoon Oh⁴ &
…
Kien Nguyen⁵

237 Accesses
1 Citation
Explore all metrics

Abstract

An increasing number of deep learning methods is being applied to quantify the perception of urban environments, study the relationship between urban appearance and resident safety, and improve urban appearance. Most advanced methods extract image feature representations from street-level images through conventional visual computation algorithms or deep convolutional neural networks and then directly predict the results using features. Unfortunately, these methods take color and texture information together during processing. Color and texture are prime image features, and they affect human perception and judgment differently. We argue that color and texture should be operated differently; therefore, we formulate an end-to-end learning methodology to process input images according to color and texture information before inputting it into the neural network. The processed images and the original image constitute three input streams for the triad attention ranking convolutional neural network (AR-CNN) model proposed in this study. In accordance with the aspects of color and texture, an improved attention mechanism in the convolution layer is proposed. Our objective is to obtain the scores of humans on urban appearance in accordance with the prediction results computed from pairwise comparisons generated by the AR-CNN model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Urban Perception with Deep Semantic-Aware Network

Deep Learning the City: Quantifying Urban Perception at a Global Scale

A comparison of two deep-learning-based urban perception models: which one is better?

Article Open access 29 March 2021

References

Wilson J Q, Kelling G L. Broken windows. Atl Mon, 1982, 249: 29–38
Google Scholar
Salesses P, Schechtner K, Hidalgo C A. The collaborative image of the city: mapping the inequality of urban perception. PLoS ONE, 2013, 8: 68400
Article Google Scholar
Dubey A, Naik N, Parikh D, et al. Deep learning the city: quantifying urban perception at a global scale. In: Proceedings of European Conference on Computer Vision, 2016. 196–212
Naik N, Philipoom J, Raskar R, et al. Streetscore-predicting the perceived safety of one million streetscapes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. 779–785
Ren J, Shen X H, Lin Z, et al. Personalized image aesthetics. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 638–647
Dhar S, Ordonez V, Berg T L. High level describable attributes for predicting aesthetics and interestingness. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2011. 1657–1664
Isola P, Xiao J X, Torralba A, et al. What makes an image memorable? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2011. 145–152
Quercia D, O’Hare N K, Cramer H. Aesthetic capital: what makes London look beautiful, quiet, and happy? In: Proceedings of ACM Conference on Computer Supported Cooperative Work and Social Computing, 2014. 945–955
Gibson J J. The ecological approach to visual perception. Science, 1979, 42: 98–99
Google Scholar
Tamura H, Mori S, Yamawaki T. Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern, 1978, 8: 460–473
Article Google Scholar
Liu J L, Lughofer E, Zeng X Y. Aesthetic perception of visual textures: a holistic exploration using texture analysis, psychological experiment, and perception modeling. Front Comput Neurosci, 2015, 9: 134
Article Google Scholar
Thompson M, Haber R N, Hershenson M. The psychology of visual perception. Leonardo, 1976, 9: 74
Article Google Scholar
Trussell H J, Lin J, Shamey R. Effects of texture on colour perception. In: Proceedings of the 10th IVMSP Workshop: Perception and Visual Signal Analysis, 2011. 7–11
Chapelle O, Keerthi S S. Efficient algorithms for ranking with SVMs. Inf Retrieval, 2010, 13: 201–215
Article Google Scholar
Ordonez V, Berg T L. Learning high-level judgments of urban perception. In: Proceedings of European Conference on Computer Vision, 2014. 494–510
Porzi L, Samuel R B, Lepri B, et al. Predicting and understanding urban perception with convolutional neural networks. In: Proceedings of ACM International Conference on Multimedia, 2015. 139–148
Radenović F, Tolias G, Chum O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 1655–1668
Article Google Scholar
Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 603–619
Article Google Scholar
Smith A R. Color gamut transform pairs. SIGGRAPH Comput Graph, 1978, 12: 12–19
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. 886–893
Jain A K, Farrokhnia F. Unsupervised texture segmentation using Gabor filters. Pattern Recogn, 1991, 24: 1167–1186
Article Google Scholar
Yin W, Schütze H, Xiang B, et al. ABCNN: attention-based convolutional neural network for modeling sentence pairs. In: Proceedings of the Transactions of the Association for Computational Linguistics, 2016. 259–272
Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2204–2212
Fu J L, Zheng H L, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4438–4446
Chen Q, Hu Q M, Huang J X, et al. Enhancing recurrent neural networks with positional attention for question answering. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017. 993–996
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of International Conference on Learning Representation, 2015
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008
Chorowski J, Bahdanau D, Serdyuk D, et al. Attention-based models for speech recognition. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 577–585
Wang F, Jiang M, Qian C, et al. Residual attention network for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 3156–3164
Herbrich R, Minka T, Graepel T. TrueSkill: a Bayesian skill rating system. In: Proceedings of Advances in Neural Information Processing Systems, 2007. 569–576
Zhou B L, Lapedriza A, Khosla A, et al. Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1452–1464
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. 62032020), in part by Hunan Science and Technology Planning Project (Grant No. 2019RS3019), in part by Hunan Provincial Natural Science Foundation of China for Distinguished Young Scholars (Grant No. 2018JJ1025), and in part by Guangzhou Research Project (Grant No. 201902010037).

Author information

Authors and Affiliations

Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, China
Zhetao Li & Ziwen Chen
Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
Zhetao Li & Ziwen Chen
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Wei-Shi Zheng
Department of Computer and Information Engineering, Ajou University, Suwon, 443-749, South Korea
Sangyoon Oh
Graduate School of Engineering, Chiba University, Chiba, 263-8522, Japan
Kien Nguyen

Authors

Zhetao Li
View author publications
You can also search for this author in PubMed Google Scholar
Ziwen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Shi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Sangyoon Oh
View author publications
You can also search for this author in PubMed Google Scholar
Kien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Shi Zheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Chen, Z., Zheng, WS. et al. AR-CNN: an attention ranking network for learning urban perception. Sci. China Inf. Sci. 65, 112104 (2022). https://doi.org/10.1007/s11432-019-2899-9

Download citation

Received: 25 October 2019
Revised: 20 January 2020
Accepted: 27 April 2020
Published: 24 December 2021
DOI: https://doi.org/10.1007/s11432-019-2899-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AR-CNN: an attention ranking network for learning urban perception

Abstract

Access this article

Similar content being viewed by others

Visual Urban Perception with Deep Semantic-Aware Network

Deep Learning the City: Quantifying Urban Perception at a Global Scale

A comparison of two deep-learning-based urban perception models: which one is better?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AR-CNN: an attention ranking network for learning urban perception

Abstract

Access this article

Similar content being viewed by others

Visual Urban Perception with Deep Semantic-Aware Network

Deep Learning the City: Quantifying Urban Perception at a Global Scale

A comparison of two deep-learning-based urban perception models: which one is better?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation