Skip to main content
Log in

Re-identification and information fusion between anonymized CDR and social network data

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The analysis of multiple datasets on users’ behaviors opens interesting information fusion possibilities and, at the same time, creates a potential for re-identification and de-anonymization of users’ data. On the one hand, this kind of approaches can breach users’ privacy despite anonymization. On the other hand, combining different datasets is a key enabler for advanced context-awareness in that information from multiple sources can complement and enrich each other. In this work we analyze different anonymized mobility datasets in the direction of highlighting re-identification and information fusion possibilities. In particular we focus on call detail record (CDR) datasets released by mobile telecom operators and datasets comprising geo-localized messages released by social network sites. Results shows that: (1) in line with previous findings, few (about 4) data points are enough to uniquely pin point the majority (90 %) of the users, (2) more than 20 % of CDR users have a single social network user exhibiting a number of matching data points. We speculate that these two users might be the same person. (3) We derive an estimate of the probability of two users begin the same person given the number of data points they have in common, and estimate that for 3 % of the social network users we can find a CDR user very likely (>90 % probability) to be the same person.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://code.google.com/p/language-detection/.

References

  • Abraham R (2006) Mobile phones and economic development: evidence from the fishing industry in india. In: The international conference on information and communication technologies and development, ICTD 2006, IEEE

  • Abul O, Bonchi F, Nanni M (2010) Anonymization of moving objects databases by clustering and perturbation. Inf Syst 35(8):884–910

    Article  Google Scholar 

  • Blondel VD, Esch M, Chan C, Fabrice Clerot PD, Huens E, Morlot F, Smoreda Z, Ziemlicki C (2013) Data for development: the d4d challenge on mobile phone data. Orange Data Dev Chall. Scientific Reports 3, Article No: 1376. doi:10.1038/srep01376

  • Brickell J, Shmatikov V (2008) The cost of privacy: destruction of data-mining utility in anonymized data publishing. In: International conference on knowledge discovery and data mining, New York, NY, USA

  • Crandalla DJ, Backstromb L, Cosleyc D, Surib S, Huttenlocherb D, Kleinbergb J (2010) Inferring social ties from geographic coincidences. In: Proceedings of the National Academy of Sciences, vol 107, issue 52, pp 22436–22441. doi:10.1073/pnas.1006155107

  • Do TMT, Blom J, Gatica-Perez D (2011) Smartphone usage in the wild: a large scale analysis of applications and context. In: ICMI ’11 Proceedings of the 13th international conference on multimodal interfaces, Alicante, Spain, pp 353–360

  • Dwork C (2011) The promise of differential privacy: A tutorial on algorithmic techniques. In: IEEE symposium on foundations of computer science, Palm Springs, CA, USA

  • Gambs S, Killijian MO, del Prado Cortez MN (2013) De-anonymization attack on geolocated data. In: The 12th IEEE international conference on trust, security and privacy in computing and communications (IEEE TrustCom-13), Melbourne, Australia

  • Golle P (2006) Revisiting the uniqueness of simple demographics in the us population. In: 5th WPES workshop on privacy in electronic society, Alexandria, VA, USA

  • Montjoye Y, Hidalgo A, Verleysen M, Blondel V (2013) Unique in the crowd. the privacy bounds of human mobility. Sci Rep 3:161–180

    Article  Google Scholar 

  • Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In: IEEE symposium on security and privacy, Las Vegas, Nevada, USA

  • Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, Yan Z (2013) Semantic trajectories modeling and analysis. J ACM Comput Surv (CSUR) 45(42):161–180

    Google Scholar 

  • Pejovic V, Musolesi M (2015) Anticipatory mobile computing: a survey of the state of the art and research challenges. ACM Comput Surv 47(3), Article No. 47. doi:10.1145/2693843

  • Pentland A (2014) Big data: Balancing the risks and rewards of data-driven public policy. 2014 World Economic Forum The Global Information Technology Report 2014

  • Rajaraman A, Ullman JD (2011) Mining of massive datasets. Springer, Berlin

    Book  Google Scholar 

  • Rossi L, Musolesi M (2014) It’s the way you check-in: identifying users in location-based social networks. In: Proceedings of the second edition of the ACM conference on Online social networks, Dublin, Ireland

  • Sharad K, Danezis G (2013) De-anonymizing d4d datasets. NetMob, Cambridge

    Google Scholar 

  • Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl-Based Syst 18(10):557–570

    Article  MathSciNet  Google Scholar 

  • Sweeney L, Abu A, Winn J (2013) Identifying participants in the personal genome project by name. White Paper 1021–1, Harvard University Data Privacy Lab

  • Verkasalo DH (2010) Analysis of smartphone user behavior. In: Ninth international conference on mobile business/2010 ninth global mobility roundtable

  • Wicker S (2012) The loss of location privacy in the cellular age. Commun ACM 55(8):60–68

    Article  Google Scholar 

  • Zang H, Bolot J (2011) Anonymization of location data does not work: a large-scale measurement study. In: MobiCom11, Las Vegas, Nevada, USA

Download references

Acknowledgments

Work supported by the SOMUS project (POR-FESR 2007–2013) and by the ASCENS project (EU FP7-FET, Contract No. 257414).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alket Cecaj.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cecaj, A., Mamei, M. & Zambonelli, F. Re-identification and information fusion between anonymized CDR and social network data. J Ambient Intell Human Comput 7, 83–96 (2016). https://doi.org/10.1007/s12652-015-0303-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-015-0303-x

Keywords

Navigation