skip to main content
10.1145/3583780.3614958acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Mask- and Contrast-Enhanced Spatio-Temporal Learning for Urban Flow Prediction

Authors Info & Claims
Published:21 October 2023Publication History

ABSTRACT

As a critical mission of intelligent transportation systems, urban flow prediction (UFP) benefits in many city services including trip planning, congestion control, and public safety. Despite the achievements of previous studies, limited efforts have been observed on simultaneous investigation of the heterogeneity in both space and time aspects. That is, regional correlations would be variable at different timestamps. In this paper, we propose a spatio-temporal learning framework with mask and contrast enhancements to capture spatio-temporal variabilities among city regions. We devise a mask-enhanced pre-training task to learn latent correlations across the spatial and temporal dimensions, and then a graph-based method is developed to extract the significance of regions by using the inter-regional attention weights. To further acquire contrastive correlations of regions, we elaborate a pre-trained contrastive learning task with the global-local cross-attention mechanism. Thereafter, two well-trained encoders have strong capability to capture latent spatio-temporal representations for the flow forecasting with time-varying. Extensive experiments conducted on real-world urban flow datasets demonstrate that our method compares favorably with other state-of-the-art models.

References

  1. Taghreed Alghamdi, Khalid Elgazzar, Magdi Bayoumi, Taysseer Sharaf, and Sumit Shah. 2019. Forecasting traffic congestion using ARIMA modeling. In 2019 15th international wireless communications & mobile computing conference (IWCMC). IEEE, 1227--1232.Google ScholarGoogle Scholar
  2. Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems, Vol. 33 (2020), 17804--17815.Google ScholarGoogle Scholar
  3. Defu Cao, Yujing Wang, Juanyong Duan, Ce Zhang, Xia Zhu, Congrui Huang, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. 2021. Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting. CoRR, Vol. abs/2103.07719 (2021). showeprint[arXiv]2103.07719 https://arxiv.org/abs/2103.07719Google ScholarGoogle Scholar
  4. Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, and Jitendra Malik. 2020. Long-term human motion prediction with scene context. In European Conference on Computer Vision. Springer, 387--404.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ali Diba, Vivek Sharma, Luc Van Gool, and Rainer Stiefelhagen. 2019. Dynamonet: Dynamic action and motion network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6192--6201.Google ScholarGoogle ScholarCross RefCross Ref
  6. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, Vol. abs/2010.11929 (2020). showeprint[arXiv]2010.11929 https://arxiv.org/abs/2010.11929Google ScholarGoogle Scholar
  7. Yongshun Gong, Xue Dong, Jian Zhang, and Meng Chen. 2023 a. Latent evolution model for change point detection in time-varying networks. Information Sciences (2023), 119376.Google ScholarGoogle Scholar
  8. Yongshun Gong, Zhibin Li, Wei Liu, Xiankai Lu, Xinwang Liu, Ivor W Tsang, and Yilong Yin. 2023 b. Missingness-Pattern-Adaptive Learning With Incomplete Data. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).Google ScholarGoogle Scholar
  9. Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Bei Chen, and Xiangjun Dong. 2021a. A spatial missing value imputation method for multi-view urban statistical data. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 1310--1316.Google ScholarGoogle Scholar
  10. Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, and Jinfeng Yi. 2020a. Potential passenger flow prediction: A novel study for urban transportation development. In Proceedings of the AAAI Conference on Artificial Intelligence. 4020--4027.Google ScholarGoogle ScholarCross RefCross Ref
  11. Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yilong Yin, and Yu Zheng. 2021b. Missing value imputation for multi-view urban statistical data via spatial correlation learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 1 (2021), 686--698.Google ScholarGoogle Scholar
  12. Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, and Yu Zheng. 2020b. Online spatio-temporal crowd flow distribution prediction for complex metro system. IEEE Transactions on Knowledge and Data Engineering (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, and Christina Kirsch. 2018. Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization. In Proceedings of the 27th ACM international conference on information and knowledge management. 1243--1252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.Google ScholarGoogle ScholarCross RefCross Ref
  15. Antoine G Hobeika and Chang Kyun Kim. 1994. Traffic-flow-prediction systems based on upstream traffic. In Proceedings of VNIS'94--1994 Vehicle Navigation and Information Systems Conference. IEEE, 345--350.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li F Fei-Fei, and Juan Carlos Niebles. 2018. Learning to decompose and disentangle representations for video prediction. Advances in neural information processing systems, Vol. 31 (2018).Google ScholarGoogle Scholar
  17. Md Amirul Islam, Sen Jia, and Neil DB Bruce. 2020. How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248 (2020).Google ScholarGoogle Scholar
  18. Jiahao Ji, Jingyuan Wang, Chao Huang, Junjie Wu, Boren Xu, Zhenhe Wu, Junbo Zhang, and Yu Zheng. 2022. Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction. arXiv preprint arXiv:2212.04475 (2022).Google ScholarGoogle Scholar
  19. Zhishuai Li, Gang Xiong, Yuanyuan Chen, Yisheng Lv, Bin Hu, Fenghua Zhu, and Fei-Yue Wang. 2019. A Hybrid Deep Learning Approach with GCN and LSTM for Traffic Flow Prediction. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC). 1929--1933. https://doi.org/10.1109/ITSC.2019.8916778Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhuo Lin Li, Gao Wei Zhang, Jie Yu, and Ling Yu Xu. 2023. Dynamic graph structure learning for multivariate time series forecasting. Pattern Recognition, Vol. 138 (2023), 109423. https://doi.org/10.1016/j.patcog.2023.109423Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, and Liang Lin. 2020. Dynamic spatial-temporal representation learning for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, 11 (2020), 7169--7183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015).Google ScholarGoogle Scholar
  23. FA Omonov. 2022. The important role of intellectual transport systems in increasing the economic efficiency of public transport services. Academic research in educational sciences, Vol. 3, 3 (2022), 36--40.Google ScholarGoogle Scholar
  24. Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, and Antonis Argyros. 2020. A review on deep learning techniques for video prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).Google ScholarGoogle Scholar
  25. Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2017. Learning features by watching objects move. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2701--2710.Google ScholarGoogle ScholarCross RefCross Ref
  26. Hao Qu, Yongshun Gong, Meng Chen, Junbo Zhang, Yu Zheng, and Yilong Yin. 2022. Forecasting Fine-Grained Urban Flows Via Spatio-Temporal Contrastive Self-Supervision. IEEE Transactions on Knowledge and Data Engineering (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Md. Mokhlesur Rahman, Pooya Najaf, Milton Gregory Fields, and Jean-Claude Thill. 2022. Traffic congestion and its urban scale factors: Empirical evidence from American urban areas. International Journal of Sustainable Transportation, Vol. 16, 5 (2022), 406--421. https://doi.org/10.1080/15568318.2021.1885085Google ScholarGoogle ScholarCross RefCross Ref
  28. Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831.Google ScholarGoogle Scholar
  29. Adria Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Pua trua ucean, Florent Altché, Michal Valko, et al. 2021. Broaden your views for self-supervised video learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1255--1265.Google ScholarGoogle Scholar
  30. Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M Kitani, Dariu M Gavrila, and Kai O Arras. 2020. Human motion trajectory prediction: A survey. The International Journal of Robotics Research, Vol. 39, 8 (2020), 895--935.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sofia Serrano and Noah A Smith. 2019. Is attention interpretable? arXiv preprint arXiv:1906.03731 (2019).Google ScholarGoogle Scholar
  33. Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, and Wang-chun Woo. 2017. Deep learning for precipitation nowcasting: A benchmark and a new model. Advances in neural information processing systems, Vol. 30 (2017).Google ScholarGoogle Scholar
  34. Tom van Dijk and Guido C. H. E. de Croon. 2019. How do neural networks see depth in single images? CoRR, Vol. abs/1905.07005 (2019). showeprint[arXiv]1905.07005 http://arxiv.org/abs/1905.07005Google ScholarGoogle Scholar
  35. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).Google ScholarGoogle Scholar
  36. Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 37--42. https://doi.org/10.18653/v1/P19--3007Google ScholarGoogle Scholar
  37. Xiaolong Wang, Allan Jabri, and Alexei A Efros. 2019a. Learning correspondence from the cycle-consistency of time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2566--2576.Google ScholarGoogle ScholarCross RefCross Ref
  38. Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li. 2019b. SOLO: Segmenting Objects by Locations. CoRR, Vol. abs/1912.04488 (2019). showeprint[arXiv]1912.04488 http://arxiv.org/abs/1912.04488Google ScholarGoogle Scholar
  39. Donglai Wei, Joseph J Lim, Andrew Zisserman, and William T Freeman. 2018. Learning and using the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8060.Google ScholarGoogle ScholarCross RefCross Ref
  40. Billy M Williams. 2001. Multivariate vehicular traffic flow prediction: evaluation of ARIMAX modeling. Transportation Research Record, Vol. 1776, 1 (2001), 194--200.Google ScholarGoogle ScholarCross RefCross Ref
  41. Tyler Wilson, Pang-Ning Tan, and Lifeng Luo. 2018. A Low Rank Weighted Graph Convolutional Approach to Weather Prediction. In 2018 IEEE International Conference on Data Mining (ICDM). 627--636. https://doi.org/10.1109/ICDM.2018.00078Google ScholarGoogle ScholarCross RefCross Ref
  42. Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies, Vol. 90 (2018), 166--180.Google ScholarGoogle ScholarCross RefCross Ref
  43. Jiexia Ye, Juanjuan Zhao, Kejiang Ye, and Chengzhong Xu. 2020. Multi-STGCnet: A Graph Convolution Based Spatial-Temporal Framework for Subway Passenger Flow Forecasting. In 2020 International Joint Conference on Neural Networks (IJCNN). 1--8. https://doi.org/10.1109/IJCNN48605.2020.9207049Google ScholarGoogle Scholar
  44. Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017).Google ScholarGoogle Scholar
  45. Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence, Vol. 259 (2018), 147--166. https://doi.org/10.1016/j.artint.2018.03.002Google ScholarGoogle ScholarCross RefCross Ref
  46. Liang Zhao, Min Gao, and Zongwei Wang. 2022. ST-GSP: Spatial-Temporal Global Semantic Representation Learning for Urban Flow Prediction. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1443--1451.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mask- and Contrast-Enhanced Spatio-Temporal Learning for Urban Flow Prediction

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)281
          • Downloads (Last 6 weeks)51

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader