skip to main content
10.1145/3637528.3671918acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Profiling Urban Streets: A Semi-Supervised Prediction Model Based on Street View Imagery and Spatial Topology

Published: 24 August 2024 Publication History

Abstract

With the expansion and growth of cities, profiling urban areas with the advent of multi-modal urban datasets (e.g., points-of-interest and street view imagery) has become increasingly important in urban planing and management. Particularly, street view images have gained popularity for understanding the characteristics of urban areas due to its abundant visual information and inherent correlations with human activities. In this study, we define a street segment represented by multiple street view images as the minimum spatial unit for analysis and predict its functional and socioeconomic indicators, which presents several challenges in modeling spatial distributions of images on a street and the spatial topology (adjacency) of streets. Meanwhile, Large Language Models are capable of understanding imagery data based on its extraordinary knowledge base and unveil a remarkable opportunity for profiling streets with images. In view of the challenges and opportunity, we present a semi-supervised Urban Street Profiling Model (USPM) based on street view imagery and spatial adjacency of urban streets. Specifically, given a street with multiple images, we first employ a newly designed spatial context-based contrastive learning method to generate feature vectors of images and then apply the LSTM-based fusion method to encode multiple images on a street to yield the street visual representation; we then create the descriptions of street scenes for street view images based on the SPHINX (a large language model) and produce the street textual representation; finally, we build an urban street graph based on spatial topology (adjacency) and employ a semi-supervised graph learning algorithm to further encode the street representations for prediction. We conduct thorough experiments with real-world datasets to assess the proposed USPM. The experimental results demonstrate that USPM considerably outperforms baseline methods in two urban prediction tasks.

Supplemental Material

MP4 File - Profiling Urban Streets: A Semi-Supervised Prediction Model Based on Street View Imagery and Spatial Topology
A brief introduction to the motivation and method of our work.

References

[1]
Prabin Bhandari, Antonios Anastasopoulos, and Dieter Pfoser. 2023. Are large language models geospatially knowledgeable?. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems. 1--4.
[2]
Junxiang Bing, Meng Chen, Min Yang, Weiming Huang, Yongshun Gong, and Liqiang Nie. 2023. Pre-Trained semantic embeddings for POI categories based on multiple contexts. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 09 (2023), 8893--8904.
[3]
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[5]
Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei Efros. 2012. What makes paris look like paris? ACM Transactions on Graphics, Vol. 31, 4 (2012).
[6]
Xin Luna Dong, Seungwhan Moon, Yifan Ethan Xu, Kshitiz Malik, and Zhou Yu. 2023. Towards next-generation intelligent assistants leveraging llm techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5792--5793.
[7]
Chuyu Fang, Chuan Qin, Qi Zhang, Kaichun Yao, Jingshuai Zhang, Hengshu Zhu, Fuzhen Zhuang, and Hui Xiong. 2023. Recruitpro: A pretrained language model with skill-aware prompt learning for intelligent recruitment. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3991--4002.
[8]
Peng Gong, Bin Chen, Xuecao Li, Han Liu, Jie Wang, Yuqi Bai, Jingming Chen, Xi Chen, Lei Fang, Shuailong Feng, et al. 2020. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Science Bulletin, Vol. 65, 3 (2020), 182--187.
[9]
Wes Gurnee and Max Tegmark. 2023. Language models represent space and time. arXiv preprint arXiv:2310.02207 (2023).
[10]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, Vol. 30 (2017).
[11]
Sungwon Han, Donghyun Ahn, Hyunji Cha, Jeasurk Yang, Sungwon Park, and Meeyoung Cha. 2020. Lightweight and robust representation of economic scales from satellite imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 428--436.
[12]
Sungwon Han, Donghyun Ahn, Sungwon Park, Jeasurk Yang, Susang Lee, Jihee Kim, Hyunjoo Yang, Sangyoon Park, and Meeyoung Cha. 2020. Learning to score economic development from satellite imagery. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2970--2979.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[14]
Weiming Huang, Lizhen Cui, Meng Chen, Daokun Zhang, and Yao Yao. 2022. Estimating urban functional distributions with semantics preserved POI embedding. International Journal of Geographical Information Science, Vol. 36, 10 (2022), 1905--1930.
[15]
Yuhan Ji and Song Gao. 2023. Evaluating the effectiveness of large language models in representing textual descriptions of geometry and spatial relations. arXiv preprint arXiv:2307.03678 (2023).
[16]
Yuhao Kang, Fan Zhang, Wenzhe Peng, Song Gao, Jinmeng Rao, Fabio Duarte, and Carlo Ratti. 2021. Understanding house price appreciation using multi-source big geo-data and machine learning. Land Use Policy, Vol. 111 (2021), 104919.
[17]
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020).
[18]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[19]
Nicolas J Kraff, Michael Wurm, and Hannes Taubenböck. 2020. The dynamics of poor urban areas-analyzing morphologic transformations across the globe using Earth observation data. Cities, Vol. 107 (2020), 102905.
[20]
Jihyeon Lee, Dylan Grosz, Burak Uzkent, Sicheng Zeng, Marshall Burke, David Lobell, and Stefano Ermon. 2021. Predicting livelihood indicators from community-generated street-level imagery. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 268--276.
[21]
Tong Li, Shiduo Xin, Yanxin Xi, Sasu Tarkoma, Pan Hui, and Yong Li. 2022. Predicting multi-level socioeconomic indicators from structural urban imagery. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3282--3291.
[22]
Xiaojiang Li and Carlo Ratti. 2018. Mapping the spatial distribution of shade provision of street trees in Boston using Google Street View panoramas. Urban Forestry & Urban Greening, Vol. 31 (2018), 109--119.
[23]
Yi Li, Weiming Huang, Gao Cong, Hao Wang, and Zheng Wang. 2023. Urban region representation learning with OpenStreetMap building footprints. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1363--1373.
[24]
Zechen Li, Weiming Huang, Kai Zhao, Min Yang, Yongshun Gong, and Meng Chen. 2024. Urban region embedding via multi-view contrastive prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8724--8732.
[25]
Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, et al. 2023. Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models. arXiv preprint arXiv:2311.07575 (2023).
[26]
Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, and Stefano Ermon. 2023. Geollm: Extracting geospatial knowledge from large language models. arXiv preprint arXiv:2310.06213 (2023).
[27]
Shahin Sharifi Noorian, Achilleas Psyllidis, and Alessandro Bozzon. 2019. ST-Sem: A multimodal method for points-of-interest classification using street-level imagery. In Web Engineering: 19th International Conference, ICWE 2019, Daejeon, South Korea, June 11--14, 2019, Proceedings 19. Springer, 32--46.
[28]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
[29]
Shivangi Srivastava, John E Vargas-Mu noz, David Swinkels, and Devis Tuia. 2018. Multilabel building functions classification from ground pictures using convolutional neural networks. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery. 43--46.
[30]
Waldo R Tobler. 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography, Vol. 46, sup1 (1970), 234--240.
[31]
Wei Tu, Jinzhou Cao, Yang Yue, Shih-Lung Shaw, Meng Zhou, Zhensheng Wang, Xiaomeng Chang, Yang Xu, and Qingquan Li. 2017. Coupling mobile phone and social media data: A new approach to understanding urban functions and diurnal patterns. International Journal of Geographical Information Science, Vol. 31, 12 (2017), 2331--2358.
[32]
Petar Velivcković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In International Conference on Learning Representations.
[33]
Zhecheng Wang, Haoyuan Li, and Ram Rajagopal. 2020. Urban2vec: Incorporating street view imagery and pois for multi-modal urban neighborhood embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1013--1020.
[34]
Yanxin Xi, Tong Li, Huandong Wang, Yong Li, Sasu Tarkoma, and Pan Hui. 2022. Beyond the first law of geography: Learning representations of satellite imagery by leveraging point-of-interests. In Proceedings of the ACM Web Conference 2022. 3308--3316.
[35]
Ronghui Xu, Meng Chen, Yongshun Gong, Yang Liu, Xiaohui Yu, and Liqiang Nie. 2023. Tme: Tree-guided multi-task embedding learning towards semantic venue annotation. ACM Transactions on Information Systems, Vol. 41, 4 (2023), 1--24.
[36]
Ronghui Xu, Weiming Huang, Jun Zhao, Meng Chen, and Liqiang Nie. 2023. A spatial and adversarial representation learning approach for land use classification with POIs. ACM Transactions on Intelligent Systems and Technology, Vol. 14, 6 (2023), 1--25.
[37]
Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, and Yuxuan Liang. 2023. When urban region profiling meets large language models. arXiv preprint arXiv:2310.18340 (2023).
[38]
Chao Ye, Fan Zhang, Lan Mu, Yong Gao, and Yu Liu. 2021. Urban function recognition by integrating social media and street-level imagery. Environment and Planning B: Urban Analytics and City Science, Vol. 48, 6 (2021), 1430--1444.
[39]
Yingxue Zhang, Yanhua Li, Xun Zhou, Xiangnan Kong, and Jun Luo. 2020. Curb-gan: Conditional urban traffic estimation through spatio-temporal generative adversarial networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 842--852.
[40]
Yan Zhang, Pengyuan Liu, and Filip Biljecki. 2023. Knowledge and topology: A two layer spatially dependent graph neural networks to identify urban functions with time-series street view image. ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 198 (2023), 153--168.
[41]
Ou Zheng, Mohamed Abdel-Aty, Dongdong Wang, Zijin Wang, and Shengxuan Ding. 2023. ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation? arXiv preprint arXiv:2303.05382 (2023).
[42]
Yu Zheng, Furui Liu, and Hsun-Ping Hsieh. 2013. U-air: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1436--1444.
[43]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 6 (2017), 1452--1464.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. large language model
  2. street representation learning
  3. street view imagery
  4. urban street profiling

Qualifiers

  • Research-article

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 467
    Total Downloads
  • Downloads (Last 12 months)467
  • Downloads (Last 6 weeks)79
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media