Skip to main content
Log in

AR-CNN: an attention ranking network for learning urban perception

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

An increasing number of deep learning methods is being applied to quantify the perception of urban environments, study the relationship between urban appearance and resident safety, and improve urban appearance. Most advanced methods extract image feature representations from street-level images through conventional visual computation algorithms or deep convolutional neural networks and then directly predict the results using features. Unfortunately, these methods take color and texture information together during processing. Color and texture are prime image features, and they affect human perception and judgment differently. We argue that color and texture should be operated differently; therefore, we formulate an end-to-end learning methodology to process input images according to color and texture information before inputting it into the neural network. The processed images and the original image constitute three input streams for the triad attention ranking convolutional neural network (AR-CNN) model proposed in this study. In accordance with the aspects of color and texture, an improved attention mechanism in the convolution layer is proposed. Our objective is to obtain the scores of humans on urban appearance in accordance with the prediction results computed from pairwise comparisons generated by the AR-CNN model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wilson J Q, Kelling G L. Broken windows. Atl Mon, 1982, 249: 29–38

    Google Scholar 

  2. Salesses P, Schechtner K, Hidalgo C A. The collaborative image of the city: mapping the inequality of urban perception. PLoS ONE, 2013, 8: 68400

    Article  Google Scholar 

  3. Dubey A, Naik N, Parikh D, et al. Deep learning the city: quantifying urban perception at a global scale. In: Proceedings of European Conference on Computer Vision, 2016. 196–212

  4. Naik N, Philipoom J, Raskar R, et al. Streetscore-predicting the perceived safety of one million streetscapes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. 779–785

  5. Ren J, Shen X H, Lin Z, et al. Personalized image aesthetics. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 638–647

  6. Dhar S, Ordonez V, Berg T L. High level describable attributes for predicting aesthetics and interestingness. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2011. 1657–1664

  7. Isola P, Xiao J X, Torralba A, et al. What makes an image memorable? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2011. 145–152

  8. Quercia D, O’Hare N K, Cramer H. Aesthetic capital: what makes London look beautiful, quiet, and happy? In: Proceedings of ACM Conference on Computer Supported Cooperative Work and Social Computing, 2014. 945–955

  9. Gibson J J. The ecological approach to visual perception. Science, 1979, 42: 98–99

    Google Scholar 

  10. Tamura H, Mori S, Yamawaki T. Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern, 1978, 8: 460–473

    Article  Google Scholar 

  11. Liu J L, Lughofer E, Zeng X Y. Aesthetic perception of visual textures: a holistic exploration using texture analysis, psychological experiment, and perception modeling. Front Comput Neurosci, 2015, 9: 134

    Article  Google Scholar 

  12. Thompson M, Haber R N, Hershenson M. The psychology of visual perception. Leonardo, 1976, 9: 74

    Article  Google Scholar 

  13. Trussell H J, Lin J, Shamey R. Effects of texture on colour perception. In: Proceedings of the 10th IVMSP Workshop: Perception and Visual Signal Analysis, 2011. 7–11

  14. Chapelle O, Keerthi S S. Efficient algorithms for ranking with SVMs. Inf Retrieval, 2010, 13: 201–215

    Article  Google Scholar 

  15. Ordonez V, Berg T L. Learning high-level judgments of urban perception. In: Proceedings of European Conference on Computer Vision, 2014. 494–510

  16. Porzi L, Samuel R B, Lepri B, et al. Predicting and understanding urban perception with convolutional neural networks. In: Proceedings of ACM International Conference on Multimedia, 2015. 139–148

  17. Radenović F, Tolias G, Chum O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 1655–1668

    Article  Google Scholar 

  18. Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 603–619

    Article  Google Scholar 

  19. Smith A R. Color gamut transform pairs. SIGGRAPH Comput Graph, 1978, 12: 12–19

    Article  Google Scholar 

  20. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. 886–893

  21. Jain A K, Farrokhnia F. Unsupervised texture segmentation using Gabor filters. Pattern Recogn, 1991, 24: 1167–1186

    Article  Google Scholar 

  22. Yin W, Schütze H, Xiang B, et al. ABCNN: attention-based convolutional neural network for modeling sentence pairs. In: Proceedings of the Transactions of the Association for Computational Linguistics, 2016. 259–272

  23. Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2204–2212

  24. Fu J L, Zheng H L, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4438–4446

  25. Chen Q, Hu Q M, Huang J X, et al. Enhancing recurrent neural networks with positional attention for question answering. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017. 993–996

  26. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of International Conference on Learning Representation, 2015

  27. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5998–6008

  28. Chorowski J, Bahdanau D, Serdyuk D, et al. Attention-based models for speech recognition. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 577–585

  29. Wang F, Jiang M, Qian C, et al. Residual attention network for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 3156–3164

  30. Herbrich R, Minka T, Graepel T. TrueSkill: a Bayesian skill rating system. In: Proceedings of Advances in Neural Information Processing Systems, 2007. 569–576

  31. Zhou B L, Lapedriza A, Khosla A, et al. Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1452–1464

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant No. 62032020), in part by Hunan Science and Technology Planning Project (Grant No. 2019RS3019), in part by Hunan Provincial Natural Science Foundation of China for Distinguished Young Scholars (Grant No. 2018JJ1025), and in part by Guangzhou Research Project (Grant No. 201902010037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-Shi Zheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Chen, Z., Zheng, WS. et al. AR-CNN: an attention ranking network for learning urban perception. Sci. China Inf. Sci. 65, 112104 (2022). https://doi.org/10.1007/s11432-019-2899-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2899-9

Keywords

Navigation