skip to main content
10.1145/3678717.3691234acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Privacy Preserved Taxi Demand Prediction System for Distributed Data

Published: 22 November 2024 Publication History

Abstract

Accurate taxi-demand prediction is essential for optimizing taxi operations and enhancing urban transportation services. However, using customers' data in these systems raises significant privacy and security concerns. Traditional federated learning addresses some privacy issues by enabling model training without direct data exchange but often struggles with accuracy due to varying data distributions across different regions or service providers. In this paper, we propose CC-Net: a novel approach using collaborative learning enhanced with contrastive learning for taxi-demand prediction. Our method ensures high performance by enabling multiple parties to collaboratively train a demand-prediction model through hierarchical federated learning. In this approach, similar parties are clustered together, and federated learning is applied within each cluster. The similarity is defined without data exchange, ensuring privacy and security. We evaluated our approach using real-world data from five taxi service providers in Japan over fourteen months. The results demonstrate that CC-Net maintains the privacy of customers' data while improving prediction accuracy by at least 2.2% compared to existing techniques.

References

[1]
Osman Abul, Francesco Bonchi, and Mirco Nanni. 2008. Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases. In 2008 IEEE 24th International Conference on Data Engineering. 376--385. https://doi.org/10.1109/ICDE.2008.4497446
[2]
Mohammed Adnan, Shivam Kalra, Jesse C Cresswell, Graham W Taylor, and Hamid R Tizhoosh. 2022. Federated learning and differential privacy for medical image analysis. Scientific reports 12, 1 (2022), 1953.
[3]
Abdulmohsen A Alsaui, Yousef A Alghofaili, Mohammed Alghadeer, and Fahhad H Alharbi. 2022. Resampling techniques for materials informatics: limitations in crystal point groups classification. Journal of Chemical Information and Modeling 62, 15 (2022), 3514--3523.
[4]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[5]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.
[6]
Damien Dablain, Bartosz Krawczyk, and Nitesh V Chawla. 2022. DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems (2022).
[7]
Rong Dai, Li Shen, Fengxiang He, Xinmei Tian, and Dacheng Tao. 2022. Dispfl: Towards communication-efficient personalized federated learning via decentralized sparse training. arXiv preprint arXiv:2206.00187 (2022).
[8]
Nanqing Dong and Irina Voiculescu. 2021. Federated Contrastive Learning for Decentralized Unlabeled Medical Images. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 378--387.
[9]
José F. Díez-Pastor, Juan J. Rodríguez, César García-Osorio, and Ludmila I. Kuncheva. 2015. Random Balance: Ensembles of variable priors classifiers for imbalanced data. Knowledge-Based Systems 85 (2015), 96--111.
[10]
Ahmed El Ouadrhiri and Ahmed Abdelhadi. 2022. Differential privacy for deep and federated learning: A survey. IEEE access 10 (2022), 22359--22380.
[11]
Fang Feng, Kuan-Ching Li, Jun Shen, Qingguo Zhou, and Xuhui Yang. 2020. Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification. IEEE Access 8 (2020).
[12]
Song Gao, Jinmeng Rao, Xinyi Liu, Yuhao Kang, Qunying Huang, and Joseph App. 2019. Exploring the effectiveness of geomasking techniques for protecting the geoprivacy of Twitter users. Journal of Spatial Information Science 19 (2019).
[13]
Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. 2020. Inverting Gradients - How easy is it to break privacy in federated learning?. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 16937--16947. https://proceedings.neurips.cc/paper_files/paper/2020/file/c4ede56bbd98819ae6112b20ac6bf145-Paper.pdf
[14]
Yumeki Goto, Tomoya Matsumoto, Hamada Rizk, Naoto Yanai, and Hirozumi Yamaguchi. 2023. Privacy-preserving taxi-demand prediction using federated learning. In 2023 IEEE International Conference on Smart Computing (SMARTCOMP). IEEE, 297--302.
[15]
István Hegedŭs, Gábor Danner, and Márk Jelasity. 2021. Decentralized learning works: An empirical comparison of gossip learning and federated learning. J. Parallel and Distrib. Comput. 148 (2021).
[16]
D. Hemkumar, S. Ravichandra, and D.V.L.N. Somayajulu. 2020. Impact of prior knowledge on privacy leakage in trajectory data publishing. Engineering Science and Technology, an International Journal 23, 6 (2020), 1291--1300. https://doi.org/10.1016/j.jestch.2020.06.002
[17]
Yutao Huang, Lingyang Chu, Zirui Zhou, Lanjun Wang, Jiangchuan Liu, Jian Pei, and Yong Zhang. 2021. Personalized cross-silo federated learning on non-iid data. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35.
[18]
Hongbo Jiang, Jie Li, Ping Zhao, Fanzi Zeng, Zhu Xiao, and Arun Iyengar. 2021. Location Privacy-Preserving Mechanisms in Location-Based Services: A Comprehensive Survey. ACM Comput. Surv. 54, 1, Article 4 (jan 2021), 36 pages. https://doi.org/10.1145/3423165
[19]
Kaifeng Jiang, Dongxu Shao, Stéphane Bressan, Thomas Kister, and Kian-Lee Tan. 2013. Publishing Trajectories with Differential Privacy Guarantees. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (Baltimore, Maryland, USA) (SSDBM). Association for Computing Machinery, New York, NY, USA, Article 12, 12 pages. https://doi.org/10.1145/2484838.2484846
[20]
Justin M Johnson and Taghi M Khoshgoftaar. 2019. Survey on deep learning with class imbalance. Journal of Big Data 6, 1 (2019), 1--54.
[21]
Bartosz Krawczyk, Colin Bellinger, Roberto Corizzo, and Nathalie Japkowicz. 2021. Undersampling with Support Vectors for Multi-Class Imbalanced Data Classification. In 2021 International Joint Conference on Neural Networks (IJCNN). 1--7.
[22]
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2006. t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd international conference on data engineering. IEEE, 106--115.
[23]
Qinbin Li, Bingsheng He, and Dawn Song. 2021. Model-contrastive federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10713--10722.
[24]
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated Optimization in Heterogeneous Networks. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 429--450.
[25]
Zexi Li, Jiaxun Lu, Shuang Luo, Didi Zhu, Yunfeng Shao, Yinchuan Li, Zhimeng Zhang, Yongheng Wang, and Chao Wu. 2022. Towards Effective Clustered Federated Learning: A Peer-to-peer Framework with Adaptive Neighbor Matching. IEEE Transactions on Big Data (2022), 1--16.
[26]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[27]
Bo Liu, Wanlei Zhou, Tianqing Zhu, Longxiang Gao, and Yong Xiang. 2018. Location Privacy and Its Applications: A Systematic Study. IEEE Access 6 (2018), 17606--17624. https://doi.org/10.1109/ACCESS.2018.2822260
[28]
Shigang Liu, Yu Wang, Jun Zhang, Chao Chen, and Yang Xiang. 2017. Addressing the class imbalance problem in Twitter spam detection using ensemble learning. Computers&Security 69 (2017), 35--49.
[29]
Yang Liu, Ruo Jia, Jieping Ye, and Xiaobo Qu. 2022. How machine learning informs ride-hailing services: A survey. Communications in Transportation Research 2 (2022), 100075. https://doi.org/10.1016/j.commtr.2022.100075
[30]
Zhouyong Liu, Shun Luo, Wubin Li, Jingben Lu, Yufan Wu, Shilei Sun, Chunguo Li, and Luxi Yang. 2021. ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis. arXiv:2011.10185 [cs.CV]
[31]
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. L-Diversity: Privacy beyond k-Anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (mar 2007), 3-es. https://doi.org/10.1145/1217299.1217302
[32]
Lucas May Petry, Camila Leite Da Silva, Andrea Esuli, Chiara Renso, and Vania Bogorny. 2020. MARC: a robust method for multiple-aspect trajectory classification via space, time, and semantic embeddings. International Journal of Geographical Information Science 34, 7 (2020), 1428--1450.
[33]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 54), Aarti Singh and Jerry Zhu (Eds.). PMLR.
[34]
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2017. Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963 (2017).
[35]
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP). IEEE, 691--706.
[36]
Agnieszka Mikołajczyk and Michał Grochowski. 2018. Data augmentation for improving deep learning in image classification problem. In 2018 International Interdisciplinary PhD Workshop (IIPhDW). 117--122.
[37]
Mohamed Mohsen, Hamada Rizk, and Moustafa Youssef. 2023. Privacy-Preserving by Design: Indoor Positioning System Using Wi-Fi Passive TDOA. In 2023 24th IEEE International Conference on Mobile Data Management (MDM). 221--230. https://doi.org/10.1109/MDM58254.2023.00045
[38]
Masakazu Ohno, Riki Ukyo, Tatsuya Amano, Hamada Rizk, and Hirozumi Yamaguchi. 2023. Privacy-preserving Pedestrian Tracking using Distributed 3D LiDARs. In 2023 IEEE International Conference on Pervasive Computing and Communications (PerCom). 43--52. https://doi.org/10.1109/PERCOM56429.2023.10099061
[39]
Masakazu Ohno, Riki Ukyo, Tatsuya Amano, Hamada Rizk, and Hirozumi Yamaguchi. 2024. Privacy-preserving pedestrian tracking with path image inpainting and 3D point cloud features. Pervasive and Mobile Computing 100 (2024), 101914. https://doi.org/10.1016/j.pmcj.2024.101914
[40]
Ren Ozeki, Haruki Yonekura, Aidana Baimbetova, Hamada Rizk, and Hirozumi Yamaguchi. 2023. One Model Fits All: Cross-Region Taxi-Demand Forecasting. In The 31st ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL '23) (2023).
[41]
Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2022. Sharing without caring: privacy protection of users' spatio-temporal data without compromise on utility. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems. 1--2.
[42]
Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2023. Balancing privacy and utility of spatio-temporal data for taxi-demand prediction. In 2023 24th IEEE International Conference on Mobile Data Management (MDM). IEEE, 215--220.
[43]
Ren Ozeki, Haruki Yonekura, Hamada Rizk, and Hirozumi Yamaguchi. 2024. Decentralized Landslide Disaster Prediction for Imbalanced and Distributed Data. In 2024 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 143--150.
[44]
Apostolos Pyrgelis, Carmela Troncoso, and Emiliano De Cristofaro. 2017. Knock knock, who's there? Membership inference on aggregate location data. arXiv preprint arXiv:1708.06145 (2017).
[45]
Jinmeng Rao, Song Gao, Yuhao Kang, and Qunying Huang. 2020. LSTM-TrajGAN: A Deep Learning Approach to Trajectory Privacy Protection. https://doi.org/10.48550/ARXIV.2006.10521
[46]
Hamada Rizk and Hirozumi Yamaguchi. 2023. Hackers on the Go: Investigating Cyber Threats to Crowdsourced Mobility-on-Demand Platforms. In Proceedings of the IEEE 33th International Conference on Computer Theory and Applications.
[47]
Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, and Raman Arora. 2020. Fetchsgd: Communication-efficient federated learning with sketching. In International Conference on Machine Learning. PMLR, 8253--8265.
[48]
Thomas W Sanchez, Hannah Shumway, Trey Gordner, and Theo Lim. 2023. The prospects of artificial intelligence in urban planning. International journal of urban sciences 27, 2 (2023), 179--194.
[49]
Stefan Schestakov, Simon Gottschalk, Thorben Funke, and Elena Demidova. 2024. RE-Trace: Re-Identification of Modified GPS Trajectories. ACM Transactions on Spatial Algorithms and Systems (2024).
[50]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.
[51]
Amit Singh, Ranjeet Kumar Ranjan, and Abhishek Tiwari. 2022. Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms. Journal of Experimental & Theoretical Artificial Intelligence 34, 4 (2022), 571--598.
[52]
Mengkai Song, Zhibo Wang, Zhifei Zhang, Yang Song, Qian Wang, Ju Ren, and Hairong Qi. 2020. Analyzing user-level privacy attack against federated learning. IEEE Journal on Selected Areas in Communications 38, 10 (2020), 2430--2444.
[53]
Statista. 2024. Ride-hailing - Worldwide | statista. https://www.statista.com/outlook/mmo/shared-mobility/ride-hailing/worldwide.
[54]
Akiyoshi Suzuki, Mayu Iwata, Yuki Arase, Takahiro Hara, Xing Xie, and Shojiro Nishio. 2010. A User Location Anonymization Method for Location Based Services in a Real Environment. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (San Jose, California) (GIS '10). Association for Computing Machinery, New York, NY, USA, 398--401. https://doi.org/10.1145/1869790.1869846
[55]
Erdal Tasci, Ying Zhuge, Kevin Camphausen, and Andra V Krauze. 2022. Bias and Class Imbalance in Oncologic Data---Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets. Cancers 14, 12 (2022), 2897.
[56]
Francisco J Valverde-Albacete and Carmen Peláez-Moreno. 2014. 100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox. PloS one 9, 1 (2014), e84217.
[57]
Kangkang Wang, Rajiv Mathews, Chloé Kiddon, Hubert Eichner, Françoise Beaufays, and Daniel Ramage. 2019. Federated Evaluation of On-device Personalization. arXiv:1910.10252 [cs.LG]
[58]
Lixu Wang, Shichao Xu, Xiao Wang, and Qi Zhu. 2020. Addressing Class Imbalance in Federated Learning. arXiv:2008.06217 [cs.LG]
[59]
Shoujin Wang, Wei Liu, Jia Wu, Longbing Cao, Qinxue Meng, and Paul J. Kennedy. 2016. Training deep neural networks on imbalanced data sets. In 2016 International Joint Conference on Neural Networks (IJCNN). 4368--4374.
[60]
Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE transactions on information forensics and security 15 (2020), 3454--3469.
[61]
Kristoffer Wickstrøm, Michael Kampffmeyer, Karl Øyvind Mikalsen, and Robert Jenssen. 2022. Mixing up contrastive learning: Self-supervised representation learning for time series. Pattern Recognition Letters 155 (2022), 54--61.
[62]
Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong, Jieping Ye, and Zhenhui Li. 2018. Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (Apr. 2018). https://doi.org/10.1609/aaai.v32i1.11836
[63]
Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, et al. 2023. Toward a foundation model for time series data. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4400--4404.
[64]
Haruki Yonekura, Ren Ozeki, Hamada Rizk, and Hirozumi Yamaguchi. 2023. DEMO: STM - A Privacy-Enhanced Solution for Spatio-Temporal Trajectory Management. In 2023 24th IEEE International Conference on Mobile Data Management (MDM). 168--171. https://doi.org/10.1109/MDM58254.2023.00034
[65]
Tun-Hao You, Wen-Chih Peng, and Wang-Chien Lee. 2007. Protecting Moving Trajectories with Dummies. In 2007 International Conference on Mobile Data Management. 278--282. https://doi.org/10.1109/MDM.2007.58
[66]
Haoxiang Yu, Vaskar Raychoudhury, and Snehanshu Saha. 2021. Dynamic Taxi Ride-Sharing Through Adaptive Request Propagation Using Regional Taxi Demand and Supply. In International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services. Springer, 40--56.
[67]
Chizhan Zhang, Fenghua Zhu, Yisheng Lv, Peijun Ye, and Fei-Yue Wang. 2022. MLRNN: Taxi Demand Prediction Based on Multi-Level Deep Learning and Regional Heterogeneity Analysis. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2022), 8412--8422. https://doi.org/10.1109/TITS.2021.3080511
[68]
Chizhan Zhang, Fenghua Zhu, Xiao Wang, Leilei Sun, Haina Tang, and Yisheng Lv. 2022. Taxi Demand Prediction Using Parallel Multi-Task Learning Model. IEEE Transactions on Intelligent Transportation Systems 23, 2 (2022), 794--803. https://doi.org/10.1109/TITS.2020.3015542
[69]
Daqing Zhang, Lin Sun, Bin Li, Chao Chen, Gang Pan, Shijian Li, and Zhaohui Wu. 2015. Understanding Taxi Service Strategies From Taxi GPS Traces. IEEE Transactions on Intelligent Transportation Systems 16, 1 (2015), 123--135.
[70]
Guanglin Zhang, Anqi Zhang, and Ping Zhao. 2020. Locmia: Membership inference attacks against aggregated location data. IEEE Internet of Things Journal 7, 12 (2020), 11778--11788.
[71]
Yuanshao Zhu, Yongchao Ye, Xiangyu Zhao, and James JQ Yu. 2023. Diffusion Model for GPS Trajectory Generation. arXiv preprint arXiv:2304.11582 (2023).

Cited By

View all
  • (2024)Restoring Super-High Resolution GPS Mobility DataProceedings of the 2nd ACM SIGSPATIAL International Workshop on Geo-Privacy and Data Utility for Smart Societies10.1145/3681768.3698501(19-24)Online publication date: 29-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '24: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems
October 2024
743 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed data
  2. Metric learning
  3. Spatio-temporal analysis

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGSPATIAL '24
Sponsor:

Acceptance Rates

SIGSPATIAL '24 Paper Acceptance Rate 37 of 122 submissions, 30%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)22
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Restoring Super-High Resolution GPS Mobility DataProceedings of the 2nd ACM SIGSPATIAL International Workshop on Geo-Privacy and Data Utility for Smart Societies10.1145/3681768.3698501(19-24)Online publication date: 29-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media