Abstract
The analysis of multiple datasets on users’ behaviors opens interesting information fusion possibilities and, at the same time, creates a potential for re-identification and de-anonymization of users’ data. On the one hand, this kind of approaches can breach users’ privacy despite anonymization. On the other hand, combining different datasets is a key enabler for advanced context-awareness in that information from multiple sources can complement and enrich each other. In this work we analyze different anonymized mobility datasets in the direction of highlighting re-identification and information fusion possibilities. In particular we focus on call detail record (CDR) datasets released by mobile telecom operators and datasets comprising geo-localized messages released by social network sites. Results shows that: (1) in line with previous findings, few (about 4) data points are enough to uniquely pin point the majority (90 %) of the users, (2) more than 20 % of CDR users have a single social network user exhibiting a number of matching data points. We speculate that these two users might be the same person. (3) We derive an estimate of the probability of two users begin the same person given the number of data points they have in common, and estimate that for 3 % of the social network users we can find a CDR user very likely (>90 % probability) to be the same person.
Similar content being viewed by others
References
Abraham R (2006) Mobile phones and economic development: evidence from the fishing industry in india. In: The international conference on information and communication technologies and development, ICTD 2006, IEEE
Abul O, Bonchi F, Nanni M (2010) Anonymization of moving objects databases by clustering and perturbation. Inf Syst 35(8):884–910
Blondel VD, Esch M, Chan C, Fabrice Clerot PD, Huens E, Morlot F, Smoreda Z, Ziemlicki C (2013) Data for development: the d4d challenge on mobile phone data. Orange Data Dev Chall. Scientific Reports 3, Article No: 1376. doi:10.1038/srep01376
Brickell J, Shmatikov V (2008) The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: International conference on knowledge discovery and data mining, New York, NY, USA
Crandalla DJ, Backstromb L, Cosleyc D, Surib S, Huttenlocherb D, Kleinbergb J (2010) Inferring social ties from geographic coincidences. In: Proceedings of the National Academy of Sciences, vol 107, issue 52, pp 22436–22441. doi:10.1073/pnas.1006155107
Do TMT, Blom J, Gatica-Perez D (2011) Smartphone usage in the wild: a large scale analysis of applications and context. In: ICMI ’11 Proceedings of the 13th international conference on multimodal interfaces, Alicante, Spain, pp 353–360
Dwork C (2011) The promise of differential privacy: A tutorial on algorithmic techniques. In: IEEE symposium on foundations of computer science, Palm Springs, CA, USA
Gambs S, Killijian MO, del Prado Cortez MN (2013) De-anonymization attack on geolocated data. In: The 12th IEEE international conference on trust, security and privacy in computing and communications (IEEE TrustCom-13), Melbourne, Australia
Golle P (2006) Revisiting the uniqueness of simple demographics in the us population. In: 5th WPES workshop on privacy in electronic society, Alexandria, VA, USA
Montjoye Y, Hidalgo A, Verleysen M, Blondel V (2013) Unique in the crowd. the privacy bounds of human mobility. Sci Rep 3:161–180
Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In: IEEE symposium on security and privacy, Las Vegas, Nevada, USA
Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, Yan Z (2013) Semantic trajectories modeling and analysis. J ACM Comput Surv (CSUR) 45(42):161–180
Pejovic V, Musolesi M (2015) Anticipatory mobile computing: a survey of the state of the art and research challenges. ACM Comput Surv 47(3), Article No. 47. doi:10.1145/2693843
Pentland A (2014) Big data: Balancing the risks and rewards of data-driven public policy. 2014 World Economic Forum The Global Information Technology Report 2014
Rajaraman A, Ullman JD (2011) Mining of massive datasets. Springer, Berlin
Rossi L, Musolesi M (2014) It’s the way you check-in: identifying users in location-based social networks. In: Proceedings of the second edition of the ACM conference on Online social networks, Dublin, Ireland
Sharad K, Danezis G (2013) De-anonymizing d4d datasets. NetMob, Cambridge
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl-Based Syst 18(10):557–570
Sweeney L, Abu A, Winn J (2013) Identifying participants in the personal genome project by name. White Paper 1021–1, Harvard University Data Privacy Lab
Verkasalo DH (2010) Analysis of smartphone user behavior. In: Ninth international conference on mobile business/2010 ninth global mobility roundtable
Wicker S (2012) The loss of location privacy in the cellular age. Commun ACM 55(8):60–68
Zang H, Bolot J (2011) Anonymization of location data does not work: a large-scale measurement study. In: MobiCom11, Las Vegas, Nevada, USA
Acknowledgments
Work supported by the SOMUS project (POR-FESR 2007–2013) and by the ASCENS project (EU FP7-FET, Contract No. 257414).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cecaj, A., Mamei, M. & Zambonelli, F. Re-identification and information fusion between anonymized CDR and social network data. J Ambient Intell Human Comput 7, 83–96 (2016). https://doi.org/10.1007/s12652-015-0303-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-015-0303-x