Abstract
Human mobility data may lead to privacy concerns because a resident can be re-identified from these data by malicious attacks even with anonymized user IDs. For an urban service collecting mobility data, an efficient privacy risk assessment is essential for the privacy protection of its users. The existing methods enable efficient privacy risk assessments for service operators to fast adjust the quality of sensing data to lower privacy risk by using prediction models. However, for these prediction models, most of them require massive training data, which has to be collected and stored first. Such a large-scale long-term training data collection contradicts the purpose of privacy risk prediction for new urban services, which is to ensure that the quality of high-risk human mobility data is adjusted to low privacy risk within a short time. To solve this problem, we present a privacy risk prediction model based on transfer learning, i.e., TransRisk, to predict the privacy risk for a new target urban service through (1) small-scale short-term data of its own, and (2) the knowledge learned from data from other existing urban services. We envision the application of TransRisk on the traffic camera surveillance system and evaluate it with real-world mobility datasets already collected in a Chinese city, Shenzhen, including four source datasets, i.e., (i) one call detail record dataset (CDR) with 1.2 million users; (ii) one cellphone connection data dataset (CONN) with 1.2 million users; (iii) a vehicular GPS dataset (Vehicles) with 10 thousand vehicles; (iv) an electronic toll collection transaction dataset (ETC) with 156 thousand users, and a target dataset, i.e., a camera dataset (Camera) with 248 cameras. The results show that our model outperforms the state-of-the-art methods in terms of RMSE and MAE. Our work also provides valuable insights and implications on mobility data privacy risk assessment for both current and future large-scale services.
- Naomi S Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175--185, 1992.Google ScholarCross Ref
- Hugo Barbosa, Marc Barthelemy, Gourab Ghoshal, Charlotte R James, Maxime Lenormand, Thomas Louail, Ronaldo Menezes, José J Ramasco, Filippo Simini, and Marcello Tomasini. Human mobility: Models and applications. Physics Reports, 734:1--74, 2018.Google ScholarCross Ref
- Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3286--3295, 2019.Google ScholarCross Ref
- Joshua Blumenstock, Gabriel Cadamuro, and Robert On. Predicting poverty and wealth from mobile phone metadata. Science, 350(6264):1073--1076, 2015.Google ScholarCross Ref
- Antoine Boutet, Sonia Ben Mokhtar, and Vincent Primault. Uniqueness assessment of human mobility on multi-sensor datasets. 2016.Google Scholar
- Dan Calacci, Alex Berke, Kent Larson, et al. The tradeoff between the utility and risk of location data and implications for public good. arXiv preprint arXiv:1905.09350, 2019.Google Scholar
- Hancheng Cao, Jie Feng, Yong Li, and Vassilis Kostakos. Uniqueness in the city: Urban morphology and location privacy. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2):62, 2018.Google Scholar
- Sophie Cerf, Vincent Primault, Antoine Boutet, Sonia Ben Mokhtar, Robert Birke, Sara Bouchenak, Lydia Y. Chen, Nicolas Marchand, and Bogdan Robu. Pulp: Achieving privacy and utility trade-off in user mobility data. In 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS), pages 164--173, 2017.Google ScholarCross Ref
- Yves-Alexandre De Montjoye, César A Hidalgo, Michel Verleysen, and Vincent D Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific reports, 3:1376, 2013.Google ScholarCross Ref
- Yves-Alexandre De Montjoye, Laura Radaelli, Vivek Kumar Singh, et al. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, 347(6221):536--539, 2015.Google ScholarCross Ref
- Vision System Design. Deep learning system powers traffic enforcement system, 2021.Google Scholar
- Nathan Eagle and Alex Sandy Pentland. Eigenbehaviors: Identifying structure in routine. Behavioral Ecology and Sociobiology, 63(7):1057--1066, 2009.Google ScholarCross Ref
- Jerome H Friedman. Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367--378, 2002.Google Scholar
- GDPR. General data protection regulation https://gdpr-info.eu, 2018.Google Scholar
- Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. Deep transfer learning for person re-identification. arXiv preprint arXiv:1611.05244, 2016.Google Scholar
- Mehmet Emre Gursoy, Ling Liu, Stacey Truex, Lei Yu, and Wenqi Wei. Utility-aware synthesis of differentially private and attack-resilient location traces. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 196--211, 2018.Google ScholarDigital Library
- Xiaojun Hei, Chao Liang, Jian Liang, Yong Liu, and Keith W Ross. A measurement study of a large-scale p2p iptv system. IEEE transactions on multimedia, 9(8):1672--1687, 2007.Google Scholar
- Yu-Jhe Li, Fu-En Yang, Yen-Cheng Liu, Yu-Ying Yeh, Xiaofei Du, and Yu-Chiang Frank Wang. Adaptation and re-identification network: An unsupervised deep transfer learning approach to person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 172--178, 2018.Google ScholarCross Ref
- Chris YT Ma, David KY Yau, Nung Kwan Yip, and Nageswara SV Rao. Privacy vulnerability of published anonymous mobility traces. IEEE/ACM transactions on networking (TON), 21(3):720--733, 2013.Google Scholar
- Robert John McMillan, Alexander Dean Craig, and John Patrick Heinen. Motor vehicle monitoring system for determining a cost of insurance, August 18 1998. US Patent 5,797,134.Google Scholar
- Saeid Motiian, Marco Piccirilli, Donald A Adjeroh, and Gianfranco Doretto. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, pages 5715--5725, 2017.Google ScholarCross Ref
- Luca Pappalardo, Gianni Barlacchi, Roberto Pellungrini, and Filippo Simini. Human mobility from theory to practice:data, models and applications. In Companion Proceedings of The 2019 World Wide Web Conference on, pages 1311--1312, 2019.Google ScholarDigital Library
- Luca Pappalardo, Maarten Vanhoof, Lorenzo Gabrielli, Zbigniew Smoreda, Dino Pedreschi, and Fosca Giannotti. An analytical framework to nowcast well-being using mobile phone data. International Journal of Data Science and Analytics, 2(1-2):75--92, 2016.Google ScholarCross Ref
- David Pardoe and Peter Stone. Boosting for regression transfer. In ICML, 2010.Google Scholar
- Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. A data mining approach to assess privacy risk in human mobility data. ACM Transactions on Intelligent Systems and Technology (TIST), 9(3):1--27, 2017.Google Scholar
- Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1306--1315, 2016.Google ScholarCross Ref
- Zhou Qin, Fang Cao, Yu Yang, Shuai Wang, Yunhuai Liu, Chang Tan, and Desheng Zhang. Cellpred: A behavior-aware scheme for cellular data usage prediction. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(1):1--24, 2020.Google ScholarDigital Library
- Zhou Qin, Zhihan Fang, Yunhuai Liu, Chang Tan, and Desheng Zhang. A measurement framework for explicit and implicit urban traffic sensing. ACM Transactions on Sensor Networks (TOSN), 17(4):1--27, 2021.Google Scholar
- Luc Rocher, Julien M Hendrickx, and Yves-Alexandre De Montjoye. Estimating the success of re-identifications in incomplete datasets using generative models. Nature communications, 10(1):1--9, 2019.Google ScholarCross Ref
- Lior Rokach and Oded Z Maimon. Data mining with decision trees: theory and applications, volume 69. World scientific, 2008.Google ScholarDigital Library
- Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. Limits of predictability in human mobility. Science, 327(5968):1018--1021, 2010.Google ScholarCross Ref
- Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, and Depeng Jin. From fingerprint to footprint: Revealing physical world privacy leakage by cyberspace cookie logs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1209-1218. ACM, 2017.Google ScholarDigital Library
- Yingzi Wang, Nicholas Jing Yuan, Defu Lian, Linli Xu, Xing Xie, Enhong Chen, and Yong Rui. Regularity and conformity: Location prediction using heterogeneous mobility data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1275-1284. ACM, 2015.Google ScholarDigital Library
- Xiaoyang Xie, Yu Yang, Zhihan Fang, Guang Wang, Fan Zhang, Fan Zhang, Yunhuai Liu, and Desheng Zhang. cosense: Collaborative urban-scale vehicle sensing based on heterogeneous fleets. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(4):1--25, 2018.Google ScholarDigital Library
- Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, and Depeng Jin. Trajectory recovery from ash: User privacy is not preserved in aggregated mobility data. In Proceedings of the 26th International Conference on World Wide Web, pages 1241-1250. International World Wide Web Conferences Steering Committee, 2017.Google ScholarDigital Library
- Yu Yang, Xiaoyang Xie, Zhihan Fang, Fan Zhang, Yang Wang, and Desheng Zhang. Vemo: Enabling transparent vehicular mobility modeling at individual levels with full penetration. In The 25th Annual International Conference on Mobile Computing and Networking, pages 1--16, 2019.Google Scholar
Index Terms
- TransRisk: Mobility Privacy Risk Prediction based on Transferred Knowledge
Recommendations
k-anonymity: a model for protecting privacy
Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version ...
On two RFID privacy notions and their relations
Privacy of RFID systems is receiving increasing attention in the RFID community. Basically, there are two kinds of RFID privacy notions in the literature: one based on the indistinguishability of two tags, denoted as ind-privacy, and the other based on ...
Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information
The ability to collect and disseminate individually identifiable microdata is becoming increasingly important in a number of arenas. This is especially true in health care and national security, where this data is considered vital for a number of public ...
Comments