research-article

Mask- and Contrast-Enhanced Spatio-Temporal Learning for Urban Flow Prediction

Authors:

Xiangjun DongAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 3298 - 3307

https://doi.org/10.1145/3583780.3614958

Published: 21 October 2023 Publication History

Abstract

As a critical mission of intelligent transportation systems, urban flow prediction (UFP) benefits in many city services including trip planning, congestion control, and public safety. Despite the achievements of previous studies, limited efforts have been observed on simultaneous investigation of the heterogeneity in both space and time aspects. That is, regional correlations would be variable at different timestamps. In this paper, we propose a spatio-temporal learning framework with mask and contrast enhancements to capture spatio-temporal variabilities among city regions. We devise a mask-enhanced pre-training task to learn latent correlations across the spatial and temporal dimensions, and then a graph-based method is developed to extract the significance of regions by using the inter-regional attention weights. To further acquire contrastive correlations of regions, we elaborate a pre-trained contrastive learning task with the global-local cross-attention mechanism. Thereafter, two well-trained encoders have strong capability to capture latent spatio-temporal representations for the flow forecasting with time-varying. Extensive experiments conducted on real-world urban flow datasets demonstrate that our method compares favorably with other state-of-the-art models.

References

[1]

Taghreed Alghamdi, Khalid Elgazzar, Magdi Bayoumi, Taysseer Sharaf, and Sumit Shah. 2019. Forecasting traffic congestion using ARIMA modeling. In 2019 15th international wireless communications & mobile computing conference (IWCMC). IEEE, 1227--1232.

[2]

Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems, Vol. 33 (2020), 17804--17815.

[3]

Defu Cao, Yujing Wang, Juanyong Duan, Ce Zhang, Xia Zhu, Congrui Huang, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. 2021. Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting. CoRR, Vol. abs/2103.07719 (2021). showeprint[arXiv]2103.07719 https://arxiv.org/abs/2103.07719

[4]

Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, and Jitendra Malik. 2020. Long-term human motion prediction with scene context. In European Conference on Computer Vision. Springer, 387--404.

Digital Library

[5]

Ali Diba, Vivek Sharma, Luc Van Gool, and Rainer Stiefelhagen. 2019. Dynamonet: Dynamic action and motion network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6192--6201.

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, Vol. abs/2010.11929 (2020). showeprint[arXiv]2010.11929 https://arxiv.org/abs/2010.11929

[7]

Yongshun Gong, Xue Dong, Jian Zhang, and Meng Chen. 2023 a. Latent evolution model for change point detection in time-varying networks. Information Sciences (2023), 119376.

[8]

Yongshun Gong, Zhibin Li, Wei Liu, Xiankai Lu, Xinwang Liu, Ivor W Tsang, and Yilong Yin. 2023 b. Missingness-Pattern-Adaptive Learning With Incomplete Data. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

[9]

Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Bei Chen, and Xiangjun Dong. 2021a. A spatial missing value imputation method for multi-view urban statistical data. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 1310--1316.

[10]

Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, and Jinfeng Yi. 2020a. Potential passenger flow prediction: A novel study for urban transportation development. In Proceedings of the AAAI Conference on Artificial Intelligence. 4020--4027.

[11]

Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yilong Yin, and Yu Zheng. 2021b. Missing value imputation for multi-view urban statistical data via spatial correlation learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 1 (2021), 686--698.

[12]

Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, and Yu Zheng. 2020b. Online spatio-temporal crowd flow distribution prediction for complex metro system. IEEE Transactions on Knowledge and Data Engineering (2020).

Digital Library

[13]

Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, and Christina Kirsch. 2018. Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization. In Proceedings of the 27th ACM international conference on information and knowledge management. 1243--1252.

Digital Library

[14]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.

[15]

Antoine G Hobeika and Chang Kyun Kim. 1994. Traffic-flow-prediction systems based on upstream traffic. In Proceedings of VNIS'94--1994 Vehicle Navigation and Information Systems Conference. IEEE, 345--350.

[16]

Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li F Fei-Fei, and Juan Carlos Niebles. 2018. Learning to decompose and disentangle representations for video prediction. Advances in neural information processing systems, Vol. 31 (2018).

[17]

Md Amirul Islam, Sen Jia, and Neil DB Bruce. 2020. How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248 (2020).

[18]

Jiahao Ji, Jingyuan Wang, Chao Huang, Junjie Wu, Boren Xu, Zhenhe Wu, Junbo Zhang, and Yu Zheng. 2022. Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction. arXiv preprint arXiv:2212.04475 (2022).

[19]

Zhishuai Li, Gang Xiong, Yuanyuan Chen, Yisheng Lv, Bin Hu, Fenghua Zhu, and Fei-Yue Wang. 2019. A Hybrid Deep Learning Approach with GCN and LSTM for Traffic Flow Prediction. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC). 1929--1933. https://doi.org/10.1109/ITSC.2019.8916778

Digital Library

[20]

Zhuo Lin Li, Gao Wei Zhang, Jie Yu, and Ling Yu Xu. 2023. Dynamic graph structure learning for multivariate time series forecasting. Pattern Recognition, Vol. 138 (2023), 109423. https://doi.org/10.1016/j.patcog.2023.109423

Digital Library

[21]

Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, and Liang Lin. 2020. Dynamic spatial-temporal representation learning for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, 11 (2020), 7169--7183.

Digital Library

[22]

Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015).

[23]

FA Omonov. 2022. The important role of intellectual transport systems in increasing the economic efficiency of public transport services. Academic research in educational sciences, Vol. 3, 3 (2022), 36--40.

[24]

Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Antonis Argyros. 2020. A review on deep learning techniques for video prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[25]

Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2017. Learning features by watching objects move. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2701--2710.

[26]

Hao Qu, Yongshun Gong, Meng Chen, Junbo Zhang, Yu Zheng, and Yilong Yin. 2022. Forecasting Fine-Grained Urban Flows Via Spatio-Temporal Contrastive Self-Supervision. IEEE Transactions on Knowledge and Data Engineering (2022).

Digital Library

[27]

Md. Mokhlesur Rahman, Pooya Najaf, Milton Gregory Fields, and Jean-Claude Thill. 2022. Traffic congestion and its urban scale factors: Empirical evidence from American urban areas. International Journal of Sustainable Transportation, Vol. 16, 5 (2022), 406--421. https://doi.org/10.1080/15568318.2021.1885085

[28]

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831.

[29]

Adria Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Pua trua ucean, Florent Altché, Michal Valko, et al. 2021. Broaden your views for self-supervised video learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1255--1265.

[30]

Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M Kitani, Dariu M Gavrila, and Kai O Arras. 2020. Human motion trajectory prediction: A survey. The International Journal of Robotics Research, Vol. 39, 8 (2020), 895--935.

Digital Library

[31]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.

[32]

Sofia Serrano and Noah A Smith. 2019. Is attention interpretable? arXiv preprint arXiv:1906.03731 (2019).

[33]

Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. 2017. Deep learning for precipitation nowcasting: A benchmark and a new model. Advances in neural information processing systems, Vol. 30 (2017).

[34]

Tom van Dijk and Guido C. H. E. de Croon. 2019. How do neural networks see depth in single images? CoRR, Vol. abs/1905.07005 (2019). showeprint[arXiv]1905.07005 http://arxiv.org/abs/1905.07005

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[36]

Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 37--42. https://doi.org/10.18653/v1/P19--3007

[37]

Xiaolong Wang, Allan Jabri, and Alexei A Efros. 2019a. Learning correspondence from the cycle-consistency of time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2566--2576.

[38]

Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li. 2019b. SOLO: Segmenting Objects by Locations. CoRR, Vol. abs/1912.04488 (2019). showeprint[arXiv]1912.04488 http://arxiv.org/abs/1912.04488

[39]

Donglai Wei, Joseph J Lim, Andrew Zisserman, and William T Freeman. 2018. Learning and using the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8060.

[40]

Billy M Williams. 2001. Multivariate vehicular traffic flow prediction: evaluation of ARIMAX modeling. Transportation Research Record, Vol. 1776, 1 (2001), 194--200.

[41]

Tyler Wilson, Pang-Ning Tan, and Lifeng Luo. 2018. A Low Rank Weighted Graph Convolutional Approach to Weather Prediction. In 2018 IEEE International Conference on Data Mining (ICDM). 627--636. https://doi.org/10.1109/ICDM.2018.00078

[42]

Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies, Vol. 90 (2018), 166--180.

[43]

Jiexia Ye, Juanjuan Zhao, Kejiang Ye, and Chengzhong Xu. 2020. Multi-STGCnet: A Graph Convolution Based Spatial-Temporal Framework for Subway Passenger Flow Forecasting. In 2020 International Joint Conference on Neural Networks (IJCNN). 1--8. https://doi.org/10.1109/IJCNN48605.2020.9207049

[44]

Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017).

[45]

Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence, Vol. 259 (2018), 147--166. https://doi.org/10.1016/j.artint.2018.03.002

[46]

Liang Zhao, Min Gao, and Zongwei Wang. 2022. ST-GSP: Spatial-Temporal Global Semantic Representation Learning for Urban Flow Prediction. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1443--1451.

Digital Library

Cited By

Zhang XCao MGong YWu XDong XGuo YZhao LZhang C(2025)Enhancing urban flow prediction via mutual reinforcement with multi-scale regional informationNeural Networks10.1016/j.neunet.2024.106900182(106900)Online publication date: Feb-2025
https://doi.org/10.1016/j.neunet.2024.106900
Yu PZhang XGong YZhang JSun HZhang JZhang XYin Y(2025)Enhancing origin–destination flow prediction via bi-directional spatio-temporal inference and interconnected feature evolutionExpert Systems with Applications10.1016/j.eswa.2024.125679264(125679)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125679
Zhang DChen LZhang L(2024)UrbanMC: Masking and Contrastive Self-Supervision For Fine-Grained Urban Flows InferenceProceedings of the 2024 8th International Conference on Deep Learning Technologies10.1145/3695719.3695722(15-21)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1145/3695719.3695722
Show More Cited By

Index Terms

Mask- and Contrast-Enhanced Spatio-Temporal Learning for Urban Flow Prediction
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information systems applications
    1. Data mining
    2. Spatial-temporal systems

Recommendations

Spatio-temporal fusion and contrastive learning for urban flow prediction
Abstract
Urban flow prediction is critical for urban planning, management, and safety. However, owing to the inherent instability of urban flows, prediction accuracy requires the fusion of multi-view influencing factors. Current prediction methods are ...
Highlights
- Multi-view fusion based on contrastive learning is used for urban flow prediction.
- Effective sampling methods are proposed for spatio-temporal contrastive learning.
- Appropriate multi-view extraction and fusion prediction networks ...
Dual-track spatio-temporal learning for urban flow prediction with adaptive normalization
Abstract
Robust urban flow prediction is crucial for transportation planning and management in urban areas. Although recent advances in modeling spatio-temporal correlations have shown potential, most models fail to adequately consider the complex spatio-...
Highlights
- Propose a dual-track inference module for urban flow prediction via the contextual and causality learning.
- Exploit the regional and global spatial correlations without inducing additional prior knowledge.
- Introduce the spatio-...
Spatial-Temporal Semantic Generative Adversarial Networks for Flexible Multi-step Urban Flow Prediction
Artificial Neural Networks and Machine Learning – ICANN 2022
Abstract
Accurate multi-step citywide urban flow prediction plays a critical role in traffic management and future smart city. However, it is very challenging since urban flow is affected by complex semantic factors and has multi-scale dependencies on both ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Open Fund of Beijing Key Laboratory of Traffic Data Analysis and Mining
Natural Science Foundation of Shandong Province
Shandong Excellent Young Scientists Fund (Oversea)
Taishan Scholar Project of Shandong Province
Fundamental Research Promotion Plan of Qilu University of Technology (Shandong Academy of Sciences)
National Natural Science Foundation of China

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
486
Total Downloads

Downloads (Last 12 months)270
Downloads (Last 6 weeks)15

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XCao MGong YWu XDong XGuo YZhao LZhang C(2025)Enhancing urban flow prediction via mutual reinforcement with multi-scale regional informationNeural Networks10.1016/j.neunet.2024.106900182(106900)Online publication date: Feb-2025
https://doi.org/10.1016/j.neunet.2024.106900
Yu PZhang XGong YZhang JSun HZhang JZhang XYin Y(2025)Enhancing origin–destination flow prediction via bi-directional spatio-temporal inference and interconnected feature evolutionExpert Systems with Applications10.1016/j.eswa.2024.125679264(125679)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125679
Zhang DChen LZhang L(2024)UrbanMC: Masking and Contrastive Self-Supervision For Fine-Grained Urban Flows InferenceProceedings of the 2024 8th International Conference on Deep Learning Technologies10.1145/3695719.3695722(15-21)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1145/3695719.3695722
Harel OMoskovitch R(2024)STORM: A MapReduce Framework for Symbolic Time Intervals Series ClassificationACM Transactions on Knowledge Discovery from Data10.1145/369478819:1(1-54)Online publication date: 29-Nov-2024
https://dl.acm.org/doi/10.1145/3694788
Yuan YDing JFeng JJin DLi YBaeza-Yates RBonchi F(2024)UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671662(4095-4106)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671662
Zhang WHan JXu ZNi HLiu HXiong HBaeza-Yates RBonchi F(2024)Urban Foundation Models: A SurveyProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671453(6633-6643)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671453
An YLi ZLiu WSun HChen MLu WGong YSerra ESpezzano F(2024)Spatio-temporal Graph Normalizing Flow for Probabilistic Traffic PredictionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679705(45-55)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679705
Xie RWen JChen XXie Kliang WXiong NZhang DXie GLi K(2024)M$^{2}$STL: Multi-Range Multi-Level Spatial-Temporal Learning Model for Network Traffic PredictionIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.341737111:5(4315-4329)Online publication date: Sep-2024
https://doi.org/10.1109/TNSE.2024.3417371
Zhang YHuang WYao YGao SCui LYan Z(2024)Urban region representation learning with human trajectories: a multi-view approach incorporating transition, spatial, and temporal perspectivesGIScience & Remote Sensing10.1080/15481603.2024.238739261:1Online publication date: 4-Sep-2024
https://doi.org/10.1080/15481603.2024.2387392
Liu ZDing JZheng G(2024)Frequency Enhanced Pre-training for Cross-City Few-shot Traffic ForecastingMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_3(35-52)Online publication date: 22-Aug-2024
https://doi.org/10.1007/978-3-031-70344-7_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten