skip to main content
10.1145/3583780.3614958acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Mask- and Contrast-Enhanced Spatio-Temporal Learning for Urban Flow Prediction

Published: 21 October 2023 Publication History

Abstract

As a critical mission of intelligent transportation systems, urban flow prediction (UFP) benefits in many city services including trip planning, congestion control, and public safety. Despite the achievements of previous studies, limited efforts have been observed on simultaneous investigation of the heterogeneity in both space and time aspects. That is, regional correlations would be variable at different timestamps. In this paper, we propose a spatio-temporal learning framework with mask and contrast enhancements to capture spatio-temporal variabilities among city regions. We devise a mask-enhanced pre-training task to learn latent correlations across the spatial and temporal dimensions, and then a graph-based method is developed to extract the significance of regions by using the inter-regional attention weights. To further acquire contrastive correlations of regions, we elaborate a pre-trained contrastive learning task with the global-local cross-attention mechanism. Thereafter, two well-trained encoders have strong capability to capture latent spatio-temporal representations for the flow forecasting with time-varying. Extensive experiments conducted on real-world urban flow datasets demonstrate that our method compares favorably with other state-of-the-art models.

References

[1]
Taghreed Alghamdi, Khalid Elgazzar, Magdi Bayoumi, Taysseer Sharaf, and Sumit Shah. 2019. Forecasting traffic congestion using ARIMA modeling. In 2019 15th international wireless communications & mobile computing conference (IWCMC). IEEE, 1227--1232.
[2]
Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems, Vol. 33 (2020), 17804--17815.
[3]
Defu Cao, Yujing Wang, Juanyong Duan, Ce Zhang, Xia Zhu, Congrui Huang, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. 2021. Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting. CoRR, Vol. abs/2103.07719 (2021). showeprint[arXiv]2103.07719 https://arxiv.org/abs/2103.07719
[4]
Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, and Jitendra Malik. 2020. Long-term human motion prediction with scene context. In European Conference on Computer Vision. Springer, 387--404.
[5]
Ali Diba, Vivek Sharma, Luc Van Gool, and Rainer Stiefelhagen. 2019. Dynamonet: Dynamic action and motion network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6192--6201.
[6]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, Vol. abs/2010.11929 (2020). showeprint[arXiv]2010.11929 https://arxiv.org/abs/2010.11929
[7]
Yongshun Gong, Xue Dong, Jian Zhang, and Meng Chen. 2023 a. Latent evolution model for change point detection in time-varying networks. Information Sciences (2023), 119376.
[8]
Yongshun Gong, Zhibin Li, Wei Liu, Xiankai Lu, Xinwang Liu, Ivor W Tsang, and Yilong Yin. 2023 b. Missingness-Pattern-Adaptive Learning With Incomplete Data. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[9]
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Bei Chen, and Xiangjun Dong. 2021a. A spatial missing value imputation method for multi-view urban statistical data. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 1310--1316.
[10]
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, and Jinfeng Yi. 2020a. Potential passenger flow prediction: A novel study for urban transportation development. In Proceedings of the AAAI Conference on Artificial Intelligence. 4020--4027.
[11]
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yilong Yin, and Yu Zheng. 2021b. Missing value imputation for multi-view urban statistical data via spatial correlation learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 1 (2021), 686--698.
[12]
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, and Yu Zheng. 2020b. Online spatio-temporal crowd flow distribution prediction for complex metro system. IEEE Transactions on Knowledge and Data Engineering (2020).
[13]
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, and Christina Kirsch. 2018. Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization. In Proceedings of the 27th ACM international conference on information and knowledge management. 1243--1252.
[14]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.
[15]
Antoine G Hobeika and Chang Kyun Kim. 1994. Traffic-flow-prediction systems based on upstream traffic. In Proceedings of VNIS'94--1994 Vehicle Navigation and Information Systems Conference. IEEE, 345--350.
[16]
Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li F Fei-Fei, and Juan Carlos Niebles. 2018. Learning to decompose and disentangle representations for video prediction. Advances in neural information processing systems, Vol. 31 (2018).
[17]
Md Amirul Islam, Sen Jia, and Neil DB Bruce. 2020. How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248 (2020).
[18]
Jiahao Ji, Jingyuan Wang, Chao Huang, Junjie Wu, Boren Xu, Zhenhe Wu, Junbo Zhang, and Yu Zheng. 2022. Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction. arXiv preprint arXiv:2212.04475 (2022).
[19]
Zhishuai Li, Gang Xiong, Yuanyuan Chen, Yisheng Lv, Bin Hu, Fenghua Zhu, and Fei-Yue Wang. 2019. A Hybrid Deep Learning Approach with GCN and LSTM for Traffic Flow Prediction. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC). 1929--1933. https://doi.org/10.1109/ITSC.2019.8916778
[20]
Zhuo Lin Li, Gao Wei Zhang, Jie Yu, and Ling Yu Xu. 2023. Dynamic graph structure learning for multivariate time series forecasting. Pattern Recognition, Vol. 138 (2023), 109423. https://doi.org/10.1016/j.patcog.2023.109423
[21]
Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, and Liang Lin. 2020. Dynamic spatial-temporal representation learning for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, 11 (2020), 7169--7183.
[22]
Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015).
[23]
FA Omonov. 2022. The important role of intellectual transport systems in increasing the economic efficiency of public transport services. Academic research in educational sciences, Vol. 3, 3 (2022), 36--40.
[24]
Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Antonis Argyros. 2020. A review on deep learning techniques for video prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[25]
Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2017. Learning features by watching objects move. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2701--2710.
[26]
Hao Qu, Yongshun Gong, Meng Chen, Junbo Zhang, Yu Zheng, and Yilong Yin. 2022. Forecasting Fine-Grained Urban Flows Via Spatio-Temporal Contrastive Self-Supervision. IEEE Transactions on Knowledge and Data Engineering (2022).
[27]
Md. Mokhlesur Rahman, Pooya Najaf, Milton Gregory Fields, and Jean-Claude Thill. 2022. Traffic congestion and its urban scale factors: Empirical evidence from American urban areas. International Journal of Sustainable Transportation, Vol. 16, 5 (2022), 406--421. https://doi.org/10.1080/15568318.2021.1885085
[28]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831.
[29]
Adria Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Pua trua ucean, Florent Altché, Michal Valko, et al. 2021. Broaden your views for self-supervised video learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1255--1265.
[30]
Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M Kitani, Dariu M Gavrila, and Kai O Arras. 2020. Human motion trajectory prediction: A survey. The International Journal of Robotics Research, Vol. 39, 8 (2020), 895--935.
[31]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.
[32]
Sofia Serrano and Noah A Smith. 2019. Is attention interpretable? arXiv preprint arXiv:1906.03731 (2019).
[33]
Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. 2017. Deep learning for precipitation nowcasting: A benchmark and a new model. Advances in neural information processing systems, Vol. 30 (2017).
[34]
Tom van Dijk and Guido C. H. E. de Croon. 2019. How do neural networks see depth in single images? CoRR, Vol. abs/1905.07005 (2019). showeprint[arXiv]1905.07005 http://arxiv.org/abs/1905.07005
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[36]
Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 37--42. https://doi.org/10.18653/v1/P19--3007
[37]
Xiaolong Wang, Allan Jabri, and Alexei A Efros. 2019a. Learning correspondence from the cycle-consistency of time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2566--2576.
[38]
Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li. 2019b. SOLO: Segmenting Objects by Locations. CoRR, Vol. abs/1912.04488 (2019). showeprint[arXiv]1912.04488 http://arxiv.org/abs/1912.04488
[39]
Donglai Wei, Joseph J Lim, Andrew Zisserman, and William T Freeman. 2018. Learning and using the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8060.
[40]
Billy M Williams. 2001. Multivariate vehicular traffic flow prediction: evaluation of ARIMAX modeling. Transportation Research Record, Vol. 1776, 1 (2001), 194--200.
[41]
Tyler Wilson, Pang-Ning Tan, and Lifeng Luo. 2018. A Low Rank Weighted Graph Convolutional Approach to Weather Prediction. In 2018 IEEE International Conference on Data Mining (ICDM). 627--636. https://doi.org/10.1109/ICDM.2018.00078
[42]
Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies, Vol. 90 (2018), 166--180.
[43]
Jiexia Ye, Juanjuan Zhao, Kejiang Ye, and Chengzhong Xu. 2020. Multi-STGCnet: A Graph Convolution Based Spatial-Temporal Framework for Subway Passenger Flow Forecasting. In 2020 International Joint Conference on Neural Networks (IJCNN). 1--8. https://doi.org/10.1109/IJCNN48605.2020.9207049
[44]
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017).
[45]
Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence, Vol. 259 (2018), 147--166. https://doi.org/10.1016/j.artint.2018.03.002
[46]
Liang Zhao, Min Gao, and Zongwei Wang. 2022. ST-GSP: Spatial-Temporal Global Semantic Representation Learning for Urban Flow Prediction. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1443--1451.

Cited By

View all
  • (2025)Enhancing urban flow prediction via mutual reinforcement with multi-scale regional informationNeural Networks10.1016/j.neunet.2024.106900182(106900)Online publication date: Feb-2025
  • (2025)Enhancing origin–destination flow prediction via bi-directional spatio-temporal inference and interconnected feature evolutionExpert Systems with Applications10.1016/j.eswa.2024.125679264(125679)Online publication date: Mar-2025
  • (2024)UrbanMC: Masking and Contrastive Self-Supervision For Fine-Grained Urban Flows InferenceProceedings of the 2024 8th International Conference on Deep Learning Technologies10.1145/3695719.3695722(15-21)Online publication date: 15-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contrastive learning
  2. mask-enhacned learning
  3. spatio-temporal pre-training
  4. spatio-temporal predictive modeling
  5. traffic prediction
  6. urban flow prediction

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)270
  • Downloads (Last 6 weeks)15
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Enhancing urban flow prediction via mutual reinforcement with multi-scale regional informationNeural Networks10.1016/j.neunet.2024.106900182(106900)Online publication date: Feb-2025
  • (2025)Enhancing origin–destination flow prediction via bi-directional spatio-temporal inference and interconnected feature evolutionExpert Systems with Applications10.1016/j.eswa.2024.125679264(125679)Online publication date: Mar-2025
  • (2024)UrbanMC: Masking and Contrastive Self-Supervision For Fine-Grained Urban Flows InferenceProceedings of the 2024 8th International Conference on Deep Learning Technologies10.1145/3695719.3695722(15-21)Online publication date: 15-Jul-2024
  • (2024)STORM: A MapReduce Framework for Symbolic Time Intervals Series ClassificationACM Transactions on Knowledge Discovery from Data10.1145/369478819:1(1-54)Online publication date: 29-Nov-2024
  • (2024)UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671662(4095-4106)Online publication date: 25-Aug-2024
  • (2024)Urban Foundation Models: A SurveyProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671453(6633-6643)Online publication date: 25-Aug-2024
  • (2024)Spatio-temporal Graph Normalizing Flow for Probabilistic Traffic PredictionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679705(45-55)Online publication date: 21-Oct-2024
  • (2024)M$^{2}$STL: Multi-Range Multi-Level Spatial-Temporal Learning Model for Network Traffic PredictionIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.341737111:5(4315-4329)Online publication date: Sep-2024
  • (2024)Urban region representation learning with human trajectories: a multi-view approach incorporating transition, spatial, and temporal perspectivesGIScience & Remote Sensing10.1080/15481603.2024.238739261:1Online publication date: 4-Sep-2024
  • (2024)Frequency Enhanced Pre-training for Cross-City Few-shot Traffic ForecastingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_3(35-52)Online publication date: 22-Aug-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media